TrailDB

TrailDB is an easy portable C library that allows you to query a series of relative events. It is used to group the existing relative events in a time-series format and produce an immutable database with high compression rate.

It is designed as a complement to current existing relational databases or key-value stores and targeted for OLAP workload such as analyzing usage patterns, predicting user behavior, and detecting anomalies. One key design feature is that the database is immutable once produced. This immutability feature allows the TrailDB to reach another key feature - data compression. It leverages relativity among time-series events to achieve high compression. These two key features allow TrailDB to achieve good performance in OLAP workload.

Indexes

Inverted Index (Full Text)

This feature was introduced in TrailDB 0.6. It uses a specific inverted index to map each item to a list of page_ids. Each item is uniquely identified by a field and the value in that field.

TrailDB system provides the indexes by mapping each TrailDB item to a list of page_ids that contains that item. There is a file contains a HEADER and FIELD SECTION. TrailDB system looks into the HEADER first to get the filed's corresponding beginning offset of FIELD SECTION. Then, it finds out the corresponding item and extracts the page_ids containing that item.

System Architecture

Embedded

Stored Procedures

Not Supported

Each TrailDB is a read-only immutable file, it does not support stored procedures.

Compression

Delta Encoding Run-Length Encoding Prefix Compression

First, within a trail, events are always sorted by time. Thus, it utilizes Delta Encoding to compress the 64-bit timestamps.

Second, since events are grouped by UUID, which usually represents a logical entity such as an online shopping customer, these events within a trail tend to be predictable and TrailDB only encodes every change in behavior. This is not exactly the same as the Run-Length Encoding but similar.

Third, Huffman Coding, which is a kind of Prefix Compression method, is used to encode the skewed, low-entropy distributions of values.

Checkpoints

Not Supported

TrailDB does not support checkpoints as each database is immutable once produced.

Data Model

Relational

TrailDB system adopts a specific relational data model. The traditional relational data model consists of a key and a set of different attributes. In TrailDB system, it consists of a key called UUID and a list of one complex type.

TrailDB system defines a thing called trail that is uniquely identified by a UUID. Within each trail, there is a list of ordered events, each of which is identified and ordered by the timestamp. For each event, it contains values for the pre-defined set of fields. These fields are the actual attributes in the traditional relational data model.

This data model allows the relative events belonging to one UUID, taking one online shopping user as an example, to group together in the order of time. Thus, it offers the predictability feature among the list of events and enables TrailDB system developers to use several compression methods to achieve high compression rate and extraction speed in TrailDB.

Views

Not Supported

TrailDB system does not support views. But, as each database is an immutable file, users can create "views" by creating another immutable database by extracting data from the existing TrailDBs.

Isolation Levels

Serializable

When creating a database, there's only one process to handle it and others cannot access it. Once the database is produced, it is a read-only immutable file. Thus, everyone can issue read requests to it, but cannot issue any write operations. In this view, it is equivalent to the serializable isolation level.

Logging

Not Supported

TrailDB does not support logging and there's only one process to create the database. There is no recovery handler if the process crashes during the creation of the database. Thus, users need to start from the very beginning of the producing process.

But, TrailDB system allows merging existing TrailDBs to create a new immutable database. It is suggested to do so if there's a huge number of input events.

Concurrency Control

Not Supported

As each TrailDB is an immutable file, modifications are not allowed. There's only one process to produce a database and no one can issue read operations before the creation is finalized. Thus, there's no concurrency in TrailDB.

Query Interface

Custom API

TrailDB system does not support the standard SQL query interface. It offers the query interface in several programming languages: C, Go, Python, R, Haskell, and D. TrailDB system also provides a query engine called trck, which is a domain specific language to aggregate metrics based on events of identical UUID.

Foreign Keys

Not Supported

In TrailDB, each database consists of a collection of trails each of which is identified by a unique UUID. There are no multiple tables within a database and no constraints among databases. Thus, it does not support foreign keys.

TrailDB Logo
Website

http://traildb.io/

Source Code

https://github.com/traildb/traildb

Tech Docs

http://traildb.io/docs

Developer

AdRoll Inc.

Country of Origin

US

Start Year

2014

Project Type

Commercial, Open Source

Written in

C

Supported languages

C, D, Go, Haskell, Python, R

Licenses

MIT