HerdDB

HerdDB is a distributed SQL DBMS implemented in Java, optimized for primary key read/update access patterns. It has been designed to be embeddable in any Java Virtual Machine.

Checkpoints

Consistent Blocking

HerdDB supports periodic checkpoints, every 15 minutes by default. At checkpoints, active page ids are written to disk with their current log sequence numbers. At the same time all dirty pages are discarded and their records used to build new pages, among other new or updated records. Checkpoints can be tuned to be either as fast as possible or as clean as possible. Fast checkpoints block write operations for less time, whereas clean checkpoints optimize memory usage (fewer dirty pages left in memory) and speed up searches (fewer dirty pages left on disk).

Concurrency Control

Deterministic Concurrency Control

Before accessing records, clients acquire read or write locks. Every transaction that modifies a record holds the new data in a local buffer copy, and this new version of the record is not visible to other transactions until that one is committed (Pessimistic Row Level Locking).

WAL replication based on Apache BookKeeper.

Query Compilation

Not Supported

Query Execution

Tuple-at-a-Time Model

Query Interface

Custom API SQL Command-line / Shell

Storage Architecture

Disk-oriented

Storage Model

N-ary Storage Model (Row/Record)

HerdDB’s internal architecture stores a table as a set of key-value entries. This is implemented in Java by a very large map of binary data. Each row is translated from column-oriented to key-value format by tearing apart the “primary key” part (one or multiple columns) from the “value” part (other columns).

Storage Organization

Log-structured

At any given time, some part of the data is stored in a memory buffer and some other on disk. Transaction logs are the source-of-truth and the whole database can be recovered from them plus a checkpoint to ensure that no data can be lost on JVM crashes. When a row is stored on disk it is assigned to a "data page"; on its first mutation, it is detached from its data page and that page is marked as “dirty”. At checkpoints, all dirty pages are dismissed and their records used to build new pages, among other new or updated records. Records modified/inserted/deleted in the scope of a transaction are never written to disk and they are not present in the main buffer until that transaction is committed, so that there is always a consistent and committed data snapshot. Every transaction uses its own local buffer to store temporary data.

Stored Procedures

Not Supported

System Architecture

Shared-Nothing

WAL replication and distributed configuration based on Apache Zookeeper and Apache BookKeeper.

Views

Not Supported

Revision #5 | Updated 09/10/2019 11:50 a.m.

HerdDB

Checkpoints

Concurrency Control

Data Model

Foreign Keys

Indexes

Isolation Levels

Joins

Logging

Query Compilation

Query Execution

Query Interface

Storage Architecture

Storage Model

Storage Organization

Stored Procedures

System Architecture

Views

People Also Viewed

Website

Source Code

Tech Docs

Developer

Country of Origin

Start Year

Project Type

Written in

Embeds / Uses

Operating Systems

Licenses

People Also Viewed