XTDB

View Current Viewing Revision #6 from 12/10/2019 3:26 a.m.

Crux is an open source document database that uses Apache Kafka for the primary storage of transactions and documents, and RocksDB or LMDB to host indexes for rich query support. This decoupling allows Crux to be very scalable and allow for a large variety of use cases. Crux is a bitemporal database, which makes it possible to store and query data on two different factors, valid time and system time. Crux does not enforce any schema for the documents it stores and it supports a Datalog query interface for reading data and traversing relationships across all documents, where queries are executed so that the results are lazily streamed. Additionally, even though the main transaction log is immutable, Crux still supports the eviction of active as well as historical data.

History

Crux has been available as a Public Alpha since April 19th 2019. The Public Alpha period will continue until Crux is released as a Generally Available open source software product by JUXT later in 2019.

Indexes

B+Tree Hash Table

Crux uses RocksDB or LMDB in order to host its indexes. RocksDB uses two different formats for its indexes: block based table and plain table. In a block based table, it is easier to compress the data into blocks, but queries take longer to execute. In plain table, the data is stored in a hash table, so it takes more space to store the data, but queries execute faster. LMDB uses two different B+ trees for its indexes format. One of the B+ trees stores pages with data, and the other stores free pages that empty up after deletes.

Query Interface

Datalog

The query interface that Crux uses is the Datalog interface. This interface allows Crux to read data and explore relationships across various different documents. Additionally, the Datalog interface provides support for most SQL-like join operations and also, since Crux is a database with graph queries, the Datalog interface also allows for recursive graph traversals.

Data Model

Document / XML

The documents in Crux are all stored as Extensible Data Notation, or EDN, documents. The fields within this documents are triples, which have entity, attribute, and value fields. This data model gives Crux better support for efficient graph queries.

Storage Organization

Log-structured

Crux uses Apache Kafka as a means of storing the transaction and document logs. These logs are semi-immutable, and since these logs are decoupled from the actual Crux node, Crux is very scalable. An alternative method of storage organization that Crux can use instead of Kafka is a local log store that operates within a Crux standalone node.

Revision #6 | Updated 12/10/2019 3:26 a.m.