Dolt is a single-node and embedded DBMS that incorporates Git-style versioning as a first-class entity. Dolt behaves like Git where it is a content addressable local database where the main objects are tables instead of files. In Dolt, a user creates a database locally. The database contains tables that can be read and updated using SQL. Similar to Git, writes are staged until the user issues a commit. Upon commit, the writes are appended to permanent storage. [01]
- Website
- https://www.dolthub.com[01]
- Source Code
- https://github.com/dolthub/dolt[02]
- Tech Docs
- https://www.dolthub.com/docs/[03]
- @DoltHub
- Developer
- Country of Origin
- US
- Start Year
- 2018 [14]
- Project Type
- Open Source
- Written in
- Go
- Derived From
- Noms
- Compatible With
- MySQL
- License
- Apache v2
Branch/merge semantics are supported allowing for the tables to evolve at a different pace for multiple users. This allows for loose collaboration on data as well as multiple views on the same core data. Merge conflicts are detected for schema and data conflicts. Data conflicts are cell-based, not line-based. Remote repositories allow for cooperation among repository instances. Clone, push, and pull semantics are all available.
DoltHub Inc also created Dolthub, a website to host Dolt databases similar to GitHub.
Dolt is a single-node and embedded DBMS that incorporates Git-style versioning as a first-class entity. Dolt behaves like Git where it is a content addressable local database where the main objects are tables instead of files. In Dolt, a user creates a database locally. The database contains tables that can be read and updated using SQL. Similar to Git, writes are staged until the user issues a commit. Upon commit, the writes are appended to permanent storage.
Branch/merge semantics are supported allowing for the tables to evolve at a different pace for multiple users. This allows for loose collaboration on data as well as multiple views on the same core data. Merge conflicts are detected for schema and data conflicts. Data conflicts are cell-based, not line-based. Remote repositories allow for cooperation among repository instances. Clone, push, and pull semantics are all available.
DoltHub Inc also created Dolthub, a website to host Dolt databases similar to GitHub.[01]
Checkpoints[04]
Dolt does not support mutable database files, hence it does not explicitly take checkpoints. Dolt has a manifest that stores pointers to all currently active table files, which is updated atomically on every mutation of the database.
Compression[04][05][06]
Dolt uses Snappy an open-sourced compression library that prioritizes speed over size. All chunks are compressed with Snappy before storage and are decompressed as they are read into the block cache. It is necessary to decompress data in order to process queries.
Concurrency Control[04][07]
Dolt supports transactions. Transactions use git-style merge semantics for commit.
Indexes[04][09]
Dolt supports a B-tree like index structure for table primary keys. Configurable single or multi column secondary indexes of the same structure are also supported.
Isolation Levels[04]
Dolt supports the REPEATABLE_READ isolation level for its transactions. Additionally, clients can connect to different branches of the same database, with each branch being completely isolated from all others. Other isolation levels are in development.
Logging[04]
Dolt does not support logging but ensures data durability. New data is written once to new table files containing new chunks. The table files are written to disk before the manifest references are updated.
Query Interface[11]
Dolt provides a Git like command-line interface. Dolt also allows SQL queries through the MySQL wire protocol and through its command line. Dolt also allows data to be imported and exported using CSV files.
Storage Model[04][12]
Dolt stores tables in the N-ary Storage Model with clustered primary keys. The entire dataset is content-addressed as a Merkle Tree of component blocks. A Merkle tree is a hash-based data structure that is a generalization of the hash list. It is a tree structure in which each leaf node is a hash of a block of data, and each non-leaf node is a hash of its children. The boundaries for internal and leaf nodes are chosen by a rolling hash of the block contents.
Storage Organization[04]
Dolt stores its dataset as a Merkle Tree of component blocks. The content-addressed blocks are stored in write-once table files with a static binary-searchable index at the end. When the table files grow to a number beyond a certain threshold, a compaction phase is run. New data is only written once. It is written to new table files containing the new chunks. These table files are flushed to disk before the manifest referencing them is updated.
System Architecture[04]
Dolt is not distributed at a system level. Dolt is designed to distribute the same database to multiple locations where it can be worked on in isolation and any edits from one location can be explicitly pulled or pushed to another location.
Citations
17 sources- DoltHub dolthub.com
- GitHub - dolthub/dolt: Dolt – Git for Data · GitHub github.com
- What is Dolt? | Dolt Docs dolthub.com
- https://github.com/liquidata-inc/dolt/issues/238 github.com
- GitHub - google/snappy: A fast compressor/decompressor · GitHub github.com
- dolt/go/store/nbs/table_writer.go at 84d9eded517167eb2b1f76073df88e85665eec1d · dolthub/dolt · GitHub github.com
- Transactions in a Database with Branches | DoltHub Blog dolthub.com
- Introducing Foreign Keys | DoltHub Blog dolthub.com
- Introducing Secondary Indexes | DoltHub Blog dolthub.com
- Recent Improvements to Join Planning in Dolt | DoltHub Blog dolthub.com
- https://github.com/liquidata-inc/dolt github.com
- Merkle tree - Wikipedia wikipedia.org
- Introducing SQL VIEW Support in Dolt | DoltHub Blog dolthub.com
- Snap's Timothy Sehn Is Emerging From Retirement to Start Liquidata - Business Insider businessinsider.com
- https://github.com/dolthub/dolt/commit/3b741db0e529eea0302eb9575f4233fe61774bf3 github.com
- https://github.com/dolthub/dolt/commit/32eca34587f311b74076490299277b72b1c2fe74 github.com
- https://github.com/dolthub/dolt/commit/4a09c8ec5366a50b136975286fbf6a7dba556967 github.com