RubatoDB

RubatoDB is an academic database project started by Dr. Li-Yan Yuan at University of Alberta, Canada. It falls into the category of a NewSQL system. It aims to provide the scalable performance similar to NOSQL systems while maintaining the traditional ACID guarantees present in relational databases. SQL support is provided as the primary language with interfaces such as JDBC and ODBC. It has been implemented through a staged architecture consisting of a grid of staged modules connected through explicit queues. It implements a formula protocol for distributed concurrency control, a layer on top of Berkeley DB providing three levels of consistency guarantees. All table partitions and files along with the indexes are stored as Berkeley DB files where the transactional layer of Berkeley DB is switched off.

History

The name Rubato has been taken from the Italian word, "rubare". This literally translates to soft and subtle rhythmic changes in performance. This corresponds from RubatoDB's support for various types of consistencies giving full freedom. RubatoDB was developed as a part of the NewSql movement that started in 2009. Traditional NoSql systems were highly scalable horizontally but were schema free and provided only relaxed consistency. The NewSql systems wanted to achieve the availability and horizontal scalability of NoSql systems while at the same time also wanted to preserve the ACID guarantees of a transaction combined with the functionalities of a traditional relational database like supporting joins, tables etc.

Isolation Levels

Serializable

This is the highest possible level of isolation. The transactions are completely isolated from each other. There are no Phantom Reads, no Repeatable Reads and no Dirty Reads.

Checkpoints

Fuzzy

Fuzzy Check-pointing supported by Berkeley DB is present here too.

Storage Architecture

Disk-oriented

RubatoDB has employed a hybrid storage partition model that allows the partitioning of a table both in the horizontal and the vertical dimension and being stored separately over the network of grid nodes. All disk accesses are made through Berkeley DB as all the partitions are stored as Berkeley DB files. The user can specify partitioning schemes to incorporate human optimizations based on precursory knowledge of the workload. A tree based schema is present for Grid Partitioning. Descendant tuples are partitioned according to the ancestor they descended from, ie for every row in the parent table there must be a/a group of rows in the descendant table.

Query Execution

Vectorized Model

The SQL engine present in RubatoDB is responsible for processing all queries. It is composed of a set of staged grip modules each comprising of a software module on a node having its own request queue. Threads pull requests one after the other from the input queue and invoke the various components, like parser, query optimizer, query processor, update etc. They then fill the output queue with the results which are then used up. This structure supports both parallelism and pipe lined execution.

Storage Model

Hybrid

A hybrid storage model which partitions the table both in the horizontal and vertical dimensions and then stores the table separately over different grid nodes is employed. Each row of the table is stored on a separate node in the grid while within a row, a range of columns are stored as pair. Frequently accessed columns are clustered together in the same frequent column group. The columns that are less frequently accessed are categorized by static column groups.

System Architecture

Shared-Nothing

RubatoDB follows a Staged Event-Driven Architecture. The individual tasks are divided into Finite State Machines and the transitions between the states of the FSM are triggered by events. The architecture can be visualized as a network of nodes acting as staged modules connected by queues which are explicitly associated. SEDA breaks the execution plan into a series of stages where each stage corresponds to a subset of states from the FSM. This is now an independent identity with its own queue. It pulls tasks from the incoming queue, performs the operations and forwards it to the respective output queues.

Views

Virtual Views

RubatoDB supports Virtual Views. No physical copy of the base table is taken to create a view, hence the views are purely virtual.

Data Model

Key/Value

RubatoDB is a Key-Value store due to the underlying data model of Berkeley DB. Support is provided for variable length keys and access through BTree Indexes.

Concurrency Control

Multi-version Concurrency Control (MVCC)

Concurrency Control in RubatoDB is provided through two different layers :-

  1. Transaction Stage - This is present in all grid servers and is responsible for data integrity at a per server level. It supports operations like pre_commit, commit and rollback.
  2. FormulaDB - The FormulaDB layer is a distributed implementation of the Multi-Version Timestamp Concurrency Control Protocol. It is a thread free-layer present on the top over all the Berkeley DB nodes, orchestrating the transactions and allowing the system to provide three levels of consistency :-
  3. ACID - The traditional Relational DBMS consistency guarantees, Atomicity, Consistency, Isolation and Durability.
  4. BASE - This is one of models of the NoSQL world with weak consistency semantics. BASE stands for Basically Available with Soft State Eventual Consistency.
  5. BASIC - RubatoDB also provides a middle ground of the both SQL and new SQL worlds, standing between the two extremes. BASIC stands for Basic Availability, Scalability and Instant Consistency.

The BASE and BASIC models differ on choosing one spectrum of the CAP theorem, either providing instant availability with fast queries or providing consistent results with higher latency.

Logging

Physical Logging

Each Berkeley DB node supports a WAL(Write Ahead LOG). This ensures that all data is not immediately written to disk, rather the transactions are made durable by appending the LOG records on the disk. Berkeley DB nodes follow traditional ARIES algorithm to persist the log on the disk and recover in the case of a crash.

Joins

Not Supported

RubatoDB does support joins but only for multiple tables on a single node. Semi Joins or Distributed Joins are not supported yet.

Storage Organization

Indexed Sequential Access Method (ISAM)

IBM designed ISAM(Indexed Sequential Access Method) to support both sequential and random access of the records. The sequential access is done just by a sequential scan through the records. The random accesses are supported using an index where each separate index defines a different ordering of the records. The underlying Berkeley DB layer is based on an ISAM storage organization.

Query Interface

SQL

RubatoDB fully supports SQL99.