Beringei is an embedded in-memory storage engine for time series data developed by Facebook. It was developed by Facebook to handle the growing scale of their system health monitoring data, as they needed a faster solution to storing constant stream of time series data, as well as a fast way to query this data.
The original name of the system was called Gorilla. The architecture was described in a VLDB 2015 research paper. The project was then open-sourced in 2017 and then discontinued in 2018. The original solution to handling monitoring data was an HBase-backed TSDB (Time series database), but the primary issues revolved around slow aggregate queries, and slow writes.
Beringei's primary data structure is a TSmap, or a Timeseries Map. It uses a C++ vector of shared-pointers to the time series names, and a case-insensitive map of time series names to the time series. The shared pointers allow scans to copy the vector quickly, while the map allows constant time lookup of a specific time series.
Beringei shard's monitoring data based on unique string keys, so that each time-series dataset can be mapped to a single host. Scaling of the database is simple since it has a share-nothing architecture.
Beringei can handle single node failures, network cuts, and entire datacenter failures by writing each time series value synchronously to two different hosts in two separate geographic regions. When a fault is detected, all read queries in one region are mapped to the other region so there is no disruption from the user end.
Beringei features a time series compression algorithm that on average compresses each time series by 12x, without loss of resolution as some lossy compression algorithms would do, and works with double precision floating point values that are present in monitoring time series data.
The algorithm was derived from the floating point data compression scheme in scientific computation, that leverages XOR's with previous values to generate a delta encoding. Beringei does not do additional compression across the time series outside of this delta encoding. Each data point in the time series, the time stamp and the value, are represented by a pair of 64 bit values. Each timestamp/value is compressed separately, by using XOR's with the previous timestamp/value. The crux of the idea is that once we have stored the initial value, values following can XOR themselves with the previous value, and obtain the number of significant bits in difference, which is then encoded with a header that serializes the difference.
Time Stamp Compression
Beringei uses a "delta of deltas" compression for time stamps, as most time stamp data points are sent at a fixed interval. This allows the database to store the difference in time stamp, rather than the whole time stamp. It uses a simple schema for variable length encoding of time stamps after storing the initial time stamp, and the overall result is that 96% of time stamps can be compressed to a single bit.
Value Compression
Beringei compresses values in addition to time stamps. It uses the aformentioned XOR floating point compression with a variable length encoding scheme to store its values.
https://code.fb.com/core-data/beringei-a-high-performance-time-series-storage-engine/
https://github.com/facebookarchive/beringei
http://www.vldb.org/pvldb/vol8/p1816-teller.pdf
2015
2017
Gorilla