Datomic

Datomic is a proprietary database management system. It is an operational DBMS, in other words, it allows updates in real time. Instead of assigning and overwriting values into named attributes, Datomic keeps track of all immutable facts over time, which sets Datomic apart because previous states can be accessed at any time. Datomic is also a distributional DBMS, which provides horizontal read scalability.

Another feature of Datomic is that it empowers application server by running queries in application server, as opposed to many other client-server DBMS in which case database server runs the queries.

In addition, Datomic leverages existing storage services like Cassandra, SQL and Dynamo DB, which provides more flexibility.

There are two Datomic products, Datomic Cloud and Datomic On-Prem. Datomic Cloud is built for AWS integration, and Datomic On-Prem (On-Premise) could be built on any infrastructure and storage services.

History

Early March 2012, the Relevance team (later joined with Metadata to form Cognitect) around Rich Hickey released Datomic, which they have started to work on since 2010. Their motivation was to move the substantial portion of power attributed to database servers into application servers, so that programmer would have more power programming with data inside the application logic.

Datomic Cloud was release in early 2018 using Amazon's components:

  • DynamoDB, EFS, EBS and S3 as storage services;
  • CloudFormation for deployment;
  • AWS Cloudwatch for logging, monitoring, and metrics.

Since the release of Datomic Cloud, the original Datomic was referred to as Datomic On-Prem (On-Premise) to distinguish from the new release.

The company building Datomic (Cognitect) was acquired by Nubank in 2020.

Query Execution

Vectorized Model

Datomic uses Datalog as its query language. Datalog is a set-oriented language rather than record-oriented, which means that instead of processing a tuple at a time, it can retrieve a set at a time.

Compression

Naïve (Record-Level)

Index trees contain "segments," arrays of records that are serialized and then compressed with zip.

Concurrency Control

Multi-version Concurrency Control (MVCC)

Datomic keeps the entire history of transactions, which allows for multi-version concurrency control.

Data Model

Key/Value

Datomic stores immutable facts as datoms over time. A datom follows the form of a 5-tuple

  • Entity ID
  • Attribute
  • Value of Attribute
  • Transaction ID (Time)
  • a boolean value encoding whether the datom is an addition or retraction.

Although Datomic doesn't require a table schema that specifies attribute columns in advance, it requires to specify properties of individual attributes. This is called universal schema.

Data in Datomic are stored in "distributed storage services," a cluster of machines where each machine stores a subset (shard) of the data independently. There could be redundancies across shards. Datomic uses key value store as its data model, and it has a consistent hash function that hashes the key (Entity ID) to the location, i.e. machine, where the corresponding tuple is stored.

Foreign Keys

Supported

Foreign keys can be defined using :db.type/ref attribute, but no foreign key constraints are enforced on them automatically. User needs to specify their own database functions to impose those constraints.

Indexes

B+Tree

Datomic indexes are covering indexes. In other words, instead of storing reference to data in the index, Datomic directly reads data from index. The index trees are shallow, with at most 3 levels: root, directories and segment leaf.

Datomic maintains four index trees with different sorting orders for efficient access of different queries. As mentioned in data model, Datomic stores immutable facts as 5-tuples, and four of them are used for indexing:

  • E: Entity ID
  • A: Attribute
  • V: Value of Attribute
  • T: Transaction ID (Time)

The four index trees are sorted by EAVT, AEVT, AVET, and VAET order respectively.

Isolation Levels

Serializable

There is only one process responsible for writing transactions, so transactions are always serializable.

Query Compilation

JIT Compilation

Datalog expressions use Clojure compiler. Clojure compiler produces Java byte code, which is typically then JIT-compiled by the JVM.

Query Interface

Datalog

Datomic's query interface is an extension from Datalog. The main difference is that Datalog systems usually have a global fact database and a set of rules, but Datomic Datalog could take multiple databases and sets of rules.

Storage Architecture

Hybrid

In storage services, data is stored in disk as segments, which is an array of datoms. As application server reads data from storage services, it builds index trees locally in memory. This allows application servers to run queries with in-memory data.

Storage Model

Custom

Datomic treats storage as a service, which means that Datomic only provides the ways to access underlying storage, but doesn't provide the actual storage. One can modify the system to change the storage service by changing the connection string.

Storage Organization

Sorted Files

Datomic stores data in storage services as sorted chunks of datoms.

Stored Procedures

Supported

User-defined functions can be invoked during transaction processes, which are called transaction functions in Datomic.

System Architecture

Embedded

In a Datomic-based system, an application server along with partial data stored on that server is called a peer. Queries run in peers. A peer can read from storage services directly through peer library, but it can only request write through a transactor, which is a single designated process in charge of writes. The transactor adds the new datoms into storage services using ACID transactions, and propagates the writes since redundancy is allowed.

Since datoms stored in storage services is immutable, peers perform extensive caching so that they can query data locally. This allows programmers to access query results as simple data structures.

Datomic Logo
Website

https://www.datomic.com/

Tech Docs

https://docs.datomic.com/

Twitter

@cognitect

Developer

Cognitect, Inc.

Country of Origin

US

Start Year

2012

Acquired By

Nubank

Project Type

Commercial

Written in

Clojure

Supported languages

Clojure, Java

Operating Systems

All OS with Java VM, Hosted

Licenses

Proprietary

Wikipedia

https://en.wikipedia.org/wiki/Datomic