JaguarDB

View Current Viewing Revision #4 from 11/24/2019 3:07 p.m.

Jaguar is a distributed NoSQL DBMS that stores data as a flat array containing fixed-length records. In each record, there is a key and a value (can have many columns). It supports standard SQL commands, fast scale out and spatial data types. It is used for data storage, data analytics and business forecasting in the IOT era.

Storage Model

Custom

Each table is a flat array containing fixed-length records. Each record has a key which can be a composite key, and a value which may include multiple columns.

The array is called "Sorted Elastic Array (SEA)". The array maintains the invariant that it should be at least 30% sparse (at least 30% of the array space is unoccupied). As more and more elements are added to the array and the sparse ratio is no longer bigger than 30%, the array will be resized. During the array resizing, a new longer array will be created and all the existing elements in the old array will be copied to the new one with enough spacing between any 2 adjacent elements.

Storage Organization

Sorted Files

For each table, all the keys are stored in one big "Sorted Elastic Array". The array is cut into multiple blocks and there is a block meta table that has pointers to each block's starting index in the SEA.

Logging

Physical Logging

Jaguar servers log client activities and table management history to disk.

System Architecture

Shared-Nothing

Uses flat master-master architecture. There are many Jaguar servers and many clients and any client can connect to any server. The Jaguar servers sync any data update among themselves in real time. (The servers may even be in separate data centers) The storage capacity and performance scales linearly to the number of Jaguar servers in-use.

Each client will maintain connections to multiple Jaguar servers at the same time. When the client wants to update a record, it computes the hash value of the key of the data record and sends the request to the server that is responsible for managing the specific hash value. So different servers manage different data records at the same time in parallel.

Query Interface

Custom API SQL Command-line / Shell

Jaguar provides a set of built-in functions that can be used in SQL commands. Standard SQL commands are also supported including create/drop table and index, load table, insert/delete record, select, join, update, group by, aggregation.

It also supports schema change: when table is created, 30% extra space is allocated to allow users to add new columns if extra space is big enough to hold the new columns. Otherwise the table is dropped and recreated with new columns.

Jaguar supports libraries including JDBC and has API for Python, PHP, etc.

It also supports querying spatial data attributes (e.g. select all circles that have x-coordinate > 5). API provided include built-in functions like Distance() that computes distance between two arbitrary geometric shapes.

Storage Architecture

Disk-oriented

All data is stored on disk. Jaguar is not an in-memory database. Memory is only used for caching and computation.

Data Model

Key/Value

Jaguar is key-value store that supports SQL. Each table is a flat array containing fixed-length records. Each record has a key which can be a composite key, and a value which may include multiple columns.

The data types supported include standard types like int, float, strings and also range, file and spatial data types. Spatial data type are values can be a parameterized geometric shape (e.g. Circle(center=Point(1,3), radius=2))

Foreign Keys

Supported

Join operations can be performed on any column (including key column) in any table.

Joins

Hash Join

Inner join and simple join operations are supported. Join operations can be performed on any column in any table.

Checkpoints

Blocking

All data can be backed up in a remote high capacity storage server. Data backup will be performed periodically with user specified frequency. Each node can also locally back up data, i.e. taking a snapshot of the database at user-defined frequency.

In case of a node crash: users should always have spare Jaguar servers to prepare for node crashes. The spare servers should have empty data directory so that when other nodes crash, the spare servers can receive data from replicas and start functioning as a regular Jaguar server.

In case of network problems: in case of temporary network disconnections, the Jaguar servers will automatically resync the data after connections are up again.

Concurrency Control

Optimistic Concurrency Control (OCC)

Jaguar is an AP system in CAP theory, which means when there is network partition, Jaguar favors availability over consistency. It offers eventual consistency. It supports multiple users parallel reading and parallel writing of data

Stored Procedures

Supported

When scaling out the Jaguar server cluster, there is no data migration like other NoSQL databases.

Jaguar also supports importing and syncing tables from other databases like MySQL or Oracle. All that is required is that the other databases maintain a change-log table and triggers that captures all the changes to the data and the Jaguar server will monitor the records in the change-log table in order to automatically synchronize its own data.