Scuba

Viewing Revision #8 from 2019-05-15 06:03 View Current

Scuba is a distributed in-memory database built at Facebook. It is a time-series data analysis database aimed towards serving real-time analytical queries approximately. Scuba aims to keep data ingestion latency low and handles huge data inflow by expelling old data from the memory.

Website: https://research.facebook.com/publications/scuba-diving-into-data-at-facebook[01]
Developer: Meta Platforms Inc
Country of Origin: US
Start Year: 2013
Project Type: Commercial
License: Proprietary

Database Entry

Scuba

Viewing Revision #8 from 2019-05-15 06:03 View Current

Compression[02]

Dictionary Encoding

Scuba uses dictionary compression for strings and variable length encoding for integers.

Data Model[02]

Relational

Scuba follows a relational model with some key differences. It does not support a CREATE TABLE statement; the table's schema is inferred from the ingested data. Since the data is partitioned, the schema for the same table can differ across nodes. This difference is reconciled during aggregation.

Indexes[02]

Not Supported

No table has an associated index. The leaf nodes (nodes storing data) store time-range for the data to skip scanning irrelevant data upon receiving a query.

Joins[02]

Not Supported

Logging[03]

Not Supported

Scuba is an analytical DB, thus, it does not need to support logging. However, it backs up all ingested data on disk for future recovery.

Query Interface[02]

Custom API SQL HTTP / REST

Scuba supports a web-based interface, a SQL interface through the command line, and a custom Thrift-based API for running queries from application code. All the queries originating from the SQL interface and the web interface ultimately rely on the Thrift interface to query the database backend.

Storage Architecture[03][02]

In-Memory

Scuba allocates contiguous space in memory for a table (Shared Memory Layout) as the size and the contents are known at the time of allocation. To cope up with high rates of data ingestion, Scuba evicts old data. It evicts a row if it becomes old by using a variant of TTL. In some cases, where it is necessary to keep old data around, Scuba supports subsampling where a fraction of old data is retained for analytical purposes.

Storage Model[02]

N-ary Storage Model (Row/Record)

As their primary workload is analytical, Scuba in future is considering shifting to the columnar layout.

System Architecture[02]

Shared-Nothing

Scuba partitions data across nodes and upon receiving a query aggregates results from all nodes containing the requested data. The architecture is hierarchical nature where leaf nodes store data. A query in this hierarchy originates from a single root node and passes through various intermediate aggregator nodes in a top-down fashion. The leaf nodes perform scan on the data stored locally and return results which are aggregated in a bottom-up fashion and returned to the root.

Citations

3 sources

Scuba: Diving into Data at Facebook - Meta Research facebook.com Dead — Check Archive Accessed: 2026-06-05
https://research.fb.com/wp-content/uploads/2016/11/scuba-diving-into-data-at-facebook.pdf fb.com Dead — Check Archive Modified: 2013-08-01 Accessed: 2026-06-07
https://research.fb.com/wp-content/uploads/2016/11/fast-database-restarts-at-facebook.pdf fb.com Dead — Check Archive Modified: 2014-04-06 Accessed: 2026-06-07

Revision #8 Last Updated: 2019-05-15 02:03