Dolt

View Current Viewing Revision #15 from 12/10/2019 9:43 p.m.

Dolt is a single-node DBMS that incorporates Git-style versioning as a first-class entity.

Dolt can behave as an RDBMS as it provides a MySQL Server, using the Go implementation (work in progress). Dolt is also a file format as you can write to Dolt through the command line without having the server process running. Dolt behaves like Git, i.e as a content addressable local database where the main objects are tables instead of files. In Dolt, a user creates a repository locally. The repository contains tables that can be read and updated using SQL. Similar to Git, writes are staged until the user issues a commit. Upon commit, the writes are appended to permanent storage.

Branch/merge semantics are supported allowing for the tables to evolve at a different pace for multiple users. This allows for loose collaboration on data as well as multiple views on the same core data. Merge conflicts are detected for schema and data conflicts. Data conflicts are cell-based, not line-based.

Remote repositories allow for cooperation among repository instances. Clone, push, and pull semantics are all available.

History

Source control, and particularly Git, enabled a revolution in the way people collaborate when they create software. It did this by providing semantics for building source code in a collaborative and decentralized way. Dolt aims to do this for data to allow people to be able to build relational databases collaboratively.

Dolt was created by Liquidata and was open-sourced in late August and is under active development.

Isolation Levels

Not Supported

Dolt does not support transactions. In Dolt's model, concurrent edits take place on separate branches and would be merged by explicit user actions.

Logging

Not Supported

Dolt does not support logging but ensures data durability. New data is written once to new table files containing new chunks. The table files are written to disk before the manifest references are updated.

System Architecture

Shared-Nothing

Dolt is not distributed at a system level. Dolt is designed to distribute the same database to multiple locations where it can be worked on in isolation and any edits from one location can be explicitly pulled or pushed to another location. At the checkout level, Dolt databases are shared-nothing.

Foreign Keys

Not Supported

Under active development.

Query Interface

Custom API SQL Command-line / Shell

Dolt provides a Git like command-line interface. Dolt also allows SQL queries through the MySQL wire protocol and through its command line. Dolt also allows data to be imported and exported using CSV files.

Stored Procedures

Not Supported

Checkpoints

Not Supported

Dolt does not support mutable database files, hence it does not explicitly take checkpoints. Dolt has a manifest that stores pointers to all currently active table files, which is updated atomically on every mutation of the database.

Data Model

Relational

Dolt emulates MySQL.

Query Execution

Tuple-at-a-Time Model

Dolt supports iterator based query processing with no intra-query parallelism.

Storage Architecture

Disk-oriented

Dolt uses disk storage and can currently store datasets up to 100GB.

Compression

Dictionary Encoding

Dolt uses Snappy an open-sourced compression library that prioritizes speed over size. All chunks are compressed with Snappy before storage and are decompressed as they are read into the block cache. It is necessary to decompress data in order to process queries.

Indexes

B+Tree

Dolt supports a B-tree like index structure for table primary keys only.

Views

Not Supported

Concurrency Control

Not Supported

Dolt does not support transactions. Concurrent SQL sessions on the same dolt checkout will see read committed, with an auto-commit on each executed SQL statement.

Storage Model

N-ary Storage Model (Row/Record)

Dolt stores tables in the N-ary Storage Model with clustered primary keys. The entire dataset is content-addressed as a Merkle Tree of component blocks. A Merkle tree is a hash-based data structure that is a generalization of the hash list. It is a tree structure in which each leaf node is a hash of a block of data, and each non-leaf node is a hash of its children. The boundaries for internal and leaf nodes are chosen by a rolling hash of the block contents.

Storage Organization

Sorted Files

Dolt stores its dataset as a Merkle Tree of component blocks. The content-addressed blocks are stored in write-once table files with a static binary-searchable index at the end. When the table files grow to a number beyond a certain threshold, a compaction phase is run. New data is only written once. It is written to new table files containing the new chunks. These table files are flushed to disk before the manifest referencing them is updated.

Query Compilation

Not Supported

Revision #15 | Updated 12/10/2019 9:43 p.m.

View Current Viewing Revision #15 from 12/10/2019 9:43 p.m.

Website

https://www.dolthub.com

Source Code

https://github.com/liquidata-inc/dolt

Developer

Liquidata

Country of Origin

Start Year

2018

Project Type

Open Source

Written in

Dolt

History

Isolation Levels

Logging

System Architecture

Foreign Keys

Query Interface

Stored Procedures

Checkpoints

Data Model

Query Execution

Storage Architecture

Compression

Indexes

Views

Concurrency Control

Storage Model

Storage Organization

Query Compilation

Website

Source Code

Developer

Country of Origin

Start Year

Project Type

Written in

Derived From

Compatible With

Operating Systems

Licenses