DBDB.io The Encyclopedia of Database Systems · Est. 2017
Database of Databases

Database Entry

Cubrick


Cubrick is a distributed multidimensional in-memory DBMS developed for internal use at Facebook. It is designed for low-latency realtime OLAP analysis over large datasets. It is built from scratch to support merely the necessary features required by its realtime analysis use cases.

Database Entry

Cubrick


Cubrick is a distributed multidimensional in-memory DBMS developed for internal use at Facebook. It is designed for low-latency realtime OLAP analysis over large datasets. It is built from scratch to support merely the necessary features required by its realtime analysis use cases.

Checkpoints[02]


Data persistency in Cubrick is done by an external disk-based key value store (e.g., RocksDB), and the in-memory data are periodically and asynchronously flushed to the persistent storage.

Compression[02]


String fields in Cubrick are dictionary encoded, for both dimensions (i.e., indices) and metrics (i.e., values). Internally, Cubrick processes string fields using their encoded integers, and only converts them back when returning the results to the users.

Cubrick also uses BESS (Bit-Encoded Sparse Structure) encoding for compressing the multidimensional index for each cell (i.e., a group of metrics corresponding to the same dimension).

Data Model[02]


Cubrick stores data in bricks (i.e., partitions) in a column-wise fashion. In each brick, each column has a dynamic vector to store the metrics or the BESS encoded indices. Cells in a brick are unordered, and they are only appended to the end of the brick in the data ingestion.

Indexes[02]


Cubrick uses Granular Partitioning as the main indexing approaches to organize bricks (i.e., partitions) in a cell (i.e., table). Multidimensional indices are converted to partition ids via a conversion function, which maps predefined multidimensional ranges to an integer. The partition id to storage node mapping is maintained by consistent hashing.

Joins[02]


Cubrick assumes the ingested data are denormalized, and it does not support joins.

Logging[02]


Logging is not supported by Cubrick. Cubrick is purely in-memory, and the data persistency of Cubrick is done by the disk-based key value store (e.g., RocksDB).

Parallel Execution[02]


Queries are sent to all nodes, and nodes process the same queries locally on their own data.

Query Execution[02]


Intermediate results are generated before moving to the next step.

Query Interface[02]


SQL

A subset of SQL is supported, including filtering, aggregations, group bys, order bys, having, and some arithmetic and logical expressions.

Storage Architecture


Storage Model


Revision #3 Last Updated: