SereneDB is a search analytics database designed for real-time online analytical processing workloads. It supports aggregations, search functionalities like fuzzy matching and relevancy scoring, and data ingestion with real-time updates.
The core of SereneDB is its Hybrid Storage Engine, which integrates RocksDB for handling real-time updates, the IResearch search engine for search indexing and columnar storage, and the Velox execution engine for vectorized query processing. It supports the PostgreSQL SQL dialect and logical replication.
SereneDB ensures data persistence through consistent checkpoints and a write-ahead log (WAL). Every now and then it takes a full snapshot of the data and saves it to disk. In the intervals between these snapshots, every change is recorded in the WAL. This system allows the database to automatically recover after a crash: it loads the last saved checkpoint and then uses the WAL to re-apply all subsequent changes. For immediate data persistence, write operations can be configured to wait until the WAL journal has been fully written to the disk.
Multi-version Concurrency Control (MVCC)
To handle transactions and real-time updates, SereneDB utilizes Multi-Version Concurrency Control. This is inherited from its use of RocksDB as a core component of the storage engine.
Inverted Index (Full Text) Log-Structured Merge Tree
The default index structures are the LSM-tree provided by RocksDB for primary data organization and inverted indexes from IResearch for full-text search. SereneDB also supports vector, geospatial for specialized search and analytics use cases.
Hash Join Sort-Merge Join Index Nested Loop Join
SereneDB leverages the join algorithms provided by the Velox execution engine. SereneDB supports search index bitset join.
Decomposition Storage Model (Columnar) Hybrid
RocksDB-based layout for incremental updates. Each table row of N columns will be split into a set of N key-value pairs stored in sorted column order. Search columnar layout for persistent storage. Columns are split into a set of compressed data blocks for aggregations.
SereneDB systems consist of shard groups: groups of identical instances hosted on different underlying servers, serving the same data and replicating continuously to ensure high availability as well as providing additional query capacity. Larger data sets can be range or hash partitioned over multiple shard groups. In a sharded cluster, the values of one or more fields known as the shard key determine which shard group hosts that specific record.
https://github.com/serenedb/serenedb
SereneDB GmbH
2024