Heroic

Heroic is an open-source times-series DBMS built at Spotify.

Indexes

Inverted Index (Full Text)

The Elasticsearch DB is used by Heroic to Index all of its data. Thus, the indexing structure of heroic mirrors that of Elasticsearch DB, and is an inverted index. The benefits of this type of index is that upon conducting searching, it looks through all possible documents to find unique instances of words, thereby storing each unique words and all the instances in which that word was used. This also enables more contextual searches (i.e. searches which provide the resulting documents as well), and results in faster queries overall.

Stored Procedures

Not Supported

Cassandra, the primary storage model for Heroic does not have stored procedures. Rather, logic is more placed on the application-side, by making a client or application-level program through which users can request to "load and store data" contained inside the Cassandra DB.

Storage Organization

Log-structured

Likewise, the storage organization will also model that of Cassandra’s, being log-structured, that is, utilizing a log structured merge tree. By definition, a log-structured merge tree (LSM) tree is a key-value based tree that performs well with regards to inserting in files to which large quantities of data are inserted. Additionally, LSM trees can have multiple data structures building up the tree that priorities different storage as with the two-level LSM tree where one structure has data from memory and the other has data from disk such that data can flow across the two structures. The data from an LSM tree is sorted into run where each run is sorted by a key. For Cassandra, one key can map to multiple values which correspond to multiple data rows, and thus upon searching the tree we would have to get all corresponding values.

Storage Architecture

Disk-oriented

Because Heroic uses Cassandra as its primary form of storage, we will assume that Heroic’s Storage Architecture is modeled off of Cassandra’s as well. Cassandra is a disk-oriented database, as data in Cassandra is stored in the format of columns. However the columns itself are stored on disk. This works such that each column on disk corresponds to a different data feature, from which the columns are comprised represent different data points stored.

Data Model

Key/Value

Heroic uses a key/value data model, where each key is comprised of a “unique set of tags and resource identifiers” that correspond to a single series. In this context, we define tags as the database data that can be indexed and will be retained within the database. Additionally, each tag also has its corresponding-time series stored with the data. Tags are thus used in complex queries for both filtering and aggregations, as described by the GitHub Documentation. On the other hand, a Resource Identifier is data that cannot be indexed. However the data itself is still stored with this corresponding-time series. Thus, the purpose of resource identifiers itself is to ensure that data which is constantly changing can still be stored and accessed as per its time-series. As the GitHub documentation gives as example, if the hostname field were to change often, rather than retaining the field, for the purpose of maintaining time-series data as the documentation describes, we would keep hostname as a Resource Identifier and not a tag. As such, resource identifiers are used for querying based off of aggregations.

Storage Model

N-ary Storage Model (Row/Record)

Similar to what was discussed before regarding Cassandra being Heroic’s primary storage mechanism, Heroic also takes on the storage model of Cassandra implying that Heroic has an n-nary storage model as well. An n-nary storage model means that all related data is stored tables where the table has “n” columns, thus defining the n-nary relationship.