Heroic

Heroic is an open-source times-series DBMS built at Spotify.

Data Model

Key/Value

Heroic uses a key/value data model, where each key is comprised of a “unique set of tags and resource identifiers” that correspond to a single series. In this context, we define tags as the database data that can be indexed and will be retained within the database. Additionally, each tag also has its corresponding-time series stored with the data. Tags are thus used in complex queries for both filtering and aggregations, as described by the GitHub Documentation. On the other hand, a Resource Identifier is data that cannot be indexed. However the data itself is still stored with this corresponding-time series. Thus, the purpose of resource identifiers itself is to ensure that data which is constantly changing can still be stored and accessed as per its time-series. As the GitHub documentation gives as example, if the hostname field were to change often, rather than retaining the field, for the purpose of maintaining time-series data as the documentation describes, we would keep hostname as a Resource Identifier and not a tag. As such, resource identifiers are used for querying based off of aggregations.

Indexes

Inverted Index (Full Text)

The Elasticsearch DB is used by Heroic to Index all of its data. Thus, the indexing structure of heroic mirrors that of Elasticsearch DB, and is an inverted index. The benefits of this type of index is that upon conducting searching, it looks through all possible documents to find unique instances of words, thereby storing each unique words and all the instances in which that word was used. This also enables more contextual searches (i.e. searches which provide the resulting documents as well), and results in faster queries overall.

Storage Architecture

Disk-oriented

Because Heroic uses Cassandra as its primary form of storage, we will assume that Heroic’s Storage Architecture is modeled off of Cassandra’s as well. Cassandra is a disk-oriented database, as data in Cassandra is stored in the format of columns. However the columns itself are stored on disk. This works such that each column on disk corresponds to a different data feature, from which the columns are comprised represent different data points stored.

Storage Model

N-ary Storage Model (Row/Record)

Similar to what was discussed before regarding Cassandra being Heroic’s primary storage mechanism, Heroic also takes on the storage model of Cassandra implying that Heroic has an n-nary storage model as well. An n-nary storage model means that all related data is stored tables where the table has “n” columns, thus defining the n-nary relationship.