Datomic

Datomic is a proprietary database management system. It is an operational DBMS, in other words, it allows updates in real time. Instead of assigning and overwriting values into named attributes, Datomic keeps track of all immutable facts over time, which sets Datomic apart because previous states can be accessed at any time. Datomic is also a distributional DBMS, which provides horizontal read scalability.

There are two Datomic products, Datomic Cloud and Datomic On-Prem. Datomic Cloud is built for AWS integration, and Datomic On-Prem (On-Premise) could be built on any infrastructure and storage services.

History

Early March 2012, the Relevance team (later joined with Metadata to form Cognitect) around Rich Hickey released Datomic, which they have started to work on since 2010. Their motivation was to move the substantial portion of power attributed to database servers into application servers, so that programmer would have more power programming with data inside the application logic.

Datomic Cloud was release in early 2018, which utilized and integrated AWS technologies, for example,

  • DynamoDB, EFS, EBS and S3 as storage services;
  • CloudFormation for deployment;
  • AWS Cloudwatch for logging, monitoring, and metrics.

Since the release of Datomic Cloud, the original Datomic was referred to as Datomic On-Prem (On-Premise) to distinguish from the new release.

Foreign Keys

Supported

Foreign keys can be defined using :db.type/ref attribute, but no foreign key constraints are enforced on them automatically. User needs to specify their own database functions to impose those constraints.

Storage Model

Custom

Datomic treats storage as a service, which means that Datomic only provides the ways to access underlying storage, but doesn't provide the actual storage. Storage services could also be switched easily by simply changing the connection string.

Indexes

B+Tree

Datomic indexes are covering indexes. In other words, instead of storing reference to data in the index, Datomic directly reads data from index. The index trees are shallow, with at most 3 levels: root, directories and segment leaf.

Datomic maintains four index trees with different sorting orders for efficient access of different queries. As mentioned in data model, Datomic stores immutable facts as 5-tuples:

  • E: Entity ID
  • A: Attribute
  • V: Value of Attribute
  • T: Transaction ID (Time)
  • a boolean value encoding whether the datom is an addition or retraction.

The four index trees are sorted by EAVT, AEVT, AVET, and VAET order respectively.

Storage Architecture

Hybrid

Compression

Naïve (Record-Level)

Index trees contain "segments," arrays of records that are serialized and then compressed with zip.

Data Model

Key/Value

Datomic stores immutable facts as datoms over time. A datom follows the form of a 5-tuple

  • Entity ID
  • Attribute
  • Value of Attribute
  • Transaction ID (Time)
  • a boolean value encoding whether the datom is an addition or retraction.

Although Datomic doesn't require a table schema that specifies attribute columns in advance, it requires to specify properties of individual attributes. This is called universal schema.

Data in Datomic are stored in "distributed storage services," a cluster of machines where each machine stores a subset (shard) of the data independently. There could be redundancies across shards. Datomic uses key value store as its data model, and it has a consistent hash function that hashes the key (Entity ID) to the location, i.e. machine, where the corresponding shard is stored.

Query Interface

Datalog

Datomic's query interface is an extension from Datalog. The main difference is that Datalog systems usually have a global fact database and a set of rules, but Datomic Datalog could take multiple databases and sets of rules.

Stored Procedures

Supported

Stored procedures are represented as transaction functions in Datomic, which can be invoked during transaction processes.

Concurrency Control

Optimistic Concurrency Control (OCC)

Datomic supports optimistic concurrency control, which is made possible by its built-in compare-and-swap :db/cas.

Isolation Levels

Serializable

There is only one thread responsible for writing transactions, so transactions are always serializable.

System Architecture

Embedded

In a Datomic-based system, an application server along with partial data stored on that server is called a peer. A peer can read from storage services directly through peer library, but it can only request write through a transactor, which is a single designated process in charge of writes. The transactor adds the new datoms into storage services using ACID transactions, and propagates the writes since redundancy is allowed.

Since datoms stored in storage services is immutable, peers perform extensive caching so that they can query data locally. This allows programmers to access query results as simple data structures.

Datomic Logo
Website

https://www.datomic.com/

Tech Docs

https://docs.datomic.com/

Developer

Cognitect, Inc.

Country of Origin

US

Start Year

2012

Project Type

Commercial

Written in

Clojure

Supported languages

Clojure, Java

Operating Systems

All OS with Java VM, Hosted

Licenses

Proprietary

Wikipedia

https://en.wikipedia.org/wiki/Datomic