TrailDB

View Current Viewing Revision #13 from 12/10/2018 10:21 p.m.

TrailDB is an easy portable C library that allows you to query a series of relative events. It is used to group the existing relative events in a time-series format and produce an immutable database with high compression rate.

It is designed as a complement to current existing relational databases or key-value stores and targeted for OLAP workload such as analyzing usage patterns, predicting user behavior, and detecting anomalies. One key design feature is that the database is immutable once produced. This immutability feature allows the TrailDB to reach another key feature - data compression. It leverages relativity among time-series events to achieve high compression. These two key features allow TrailDB to achieve good performance in OLAP workload.

Data Model

Relational

TrailDB adopts a specific relational data model.

Each database is a collection of trails.

Each trail is identified by a 128-bit user-defined ID and an automatically assigned trail ID. Each trail consists a sequence of events which is ordered by time.

Each event consists of a 64-bit timestamp and several user-pre-defined fields.

Each field contains a set of values.

Compression

Delta Encoding Run-Length Encoding Prefix Compression

First, within a trail, events are always sorted by time. Thus, it utilizes Delta Encoding to compress the 64-bit timestamps.

Second, since events are grouped by UUID, which usually represents a logical entity such as an online shopping customer, these events within a trail tend to be predictable and TrailDB only encodes every change in behavior. This is not exactly the same as the Run-Length Encoding but similar.

Third, Huffman Coding, which is a kind of Prefix Compression method, is used to encode the skewed, low-entropy distributions of values.

Indexes

Hash Table

This feature is introduced in TrailDB 0.6. [TODO]

Each database is a read-only immutable file. Thus, it is equivalent to Serializable.

Logging

Not Supported

Concurrency Control

Not Supported

As each TrailDB is an immutable file, modifications are not allowed. Besides, there's only one process to produce a database and no one can issue read operations before the creation is finalized. Thus, there's no concurrency in TrailDB.