Crux is an open source document database that uses Apache Kafka for the primary storage of transactions and documents, and RocksDB or LMDB to host indexes for rich query support. This decoupling allows Crux to be very scalable and allow for a large variety of use cases. Crux is a bitemporal database, which makes it possible to store and query data on two different factors, valid time and system time. Crux does not enforce any schema for the documents it stores and it supports a Datalog query interface for reading data and traversing relationships across all documents, where queries are executed so that the results are lazily streamed. Additionally, even though the main transaction log is immutable, Crux still supports the eviction of active as well as historical data.
Crux uses RocksDB or LMDB in order to host its indexes. RocksDB uses two different formats for its indexes: block based table and plain table. In a block based table, it is easier to compress the data into blocks, but queries take longer to execute. In plain table, the data is stored in a hash table, so it takes more space to store the data, but queries execute faster. LMDB uses two different B+ trees for its indexes format. One of the B+ trees stores pages with data, and the other stores free pages that empty up after deletes.
The query interface that Crux uses is the Datalog interface. This interface allows Crux to read data and explore relationships across various different documents. Additionally, the Datalog interface provides support for most SQL-like join operations and also, since Crux is a database with graph queries, the Datalog interface also allows for recursive graph traversals.
Crux uses Apache Kafka as a means of storing the transaction and document logs. These logs are semi-immutable, and since these logs are decoupled from the actual Crux node, Crux is very scalable. An alternative method of storage organization that Crux can use instead of Kafka is a local log store that operates within a Crux standalone node.