RubatoDB is an academic database project started by Dr. Li-Yan Yuan at University of Alberta, Canada. It falls into the category of a NewSQL system. It aims to provide the scalable performance similar to NOSQL systems while maintaining the traditional ACID guarantees present in relational databases. SQL support is provided as the primary language with interfaces such as JDBC and ODBC. It has been implemented through a staged architecture consisting of a grid of staged modules connected through explicit queues. It implements a formula protocol for distributed concurrency control, a layer on top of Berkeley DB providing three levels of consistency guarantees. All table partitions and files along with the indexes are stored as Berkeley DB files where the transactional layer of Berkeley DB is switched off.
RubatoDB has employed a hybrid storage partition model that allows the partitioning of a table both in the horizontal and the vertical dimension and being stored separately over the network of grid nodes. All disk accesses are made through Berkeley DB as all the partitions are stored as Berkeley DB files. The user can specify partitioning schemes to incorporate human optimizations based on precursory knowledge of the workload. A tree based schema is present for Grid Partitioning. Descendant tuples are partitioned according to the ancestor they descended from, ie for every row in the parent table there must be a/a group of rows in the descendant table.
RubatoDB follows a Staged Event-Driven Architecture. The individual tasks are divided into Finite State Machines and the transitions between the states of the FSM are triggered by events. The architecture can be visualized as a network of nodes acting as staged modules connected by queues which are explicitly associated. SE-DA breaks the execution plan into a series of stages where each stage corresponds to a subset of states from the FSM. This is now an independent identity with its own queue. It pulls tasks from the incoming queue, performs the operations and forwards it to the respective output queues.
The SQL engine present in RubatoDB is responsible for processing all queries. It is composed of a set of staged grip modules each comprising of a software module on a node having its own request queue. Threads pull requests one after the other from the input queue and invoke the various components, like parser, query optimizer, query processor, update etc. They then fill the output queue with the results which are then used up. This structure supports both parallelism and pipe lined execution.
Multi-version Concurrency Control (MVCC)
Concurrency Control in RubatoDB is provided through two different layers :-
The BASE and BASIC models differ on choosing one spectrum of the CAP theorem, either providing instant availability with fast queries or providing consistent results with higher latency.
Each Berkeley DB node supports a WAL(Write Ahead LOG). This ensures that all data is not immediately written to disk, rather the transactions are made durable by appending the LOG records on the disk. Berkeley DB nodes follow traditional ARIES algorithm to persist the log on the disk and recover in the case of a crash.
http://webdocs.cs.ualberta.ca/~yuan/databases/rubatodb/rubatodb_dist.4.0.2.tar.gz
https://webdocs.cs.ualberta.ca/~yuan/databases/rubatodb/docs/rubatodb.html
University of Alberta
2014