Tarantool is an integration of a Lua application server and a database management system. The DBMS was originally developed as an in-memory NoSQL DBMS, and later it was extended with a disk storage engine option. Tarantool's in-memory engine is lock-free. It uses cooperative multitasking to handle thousands of connections simultaneously. There is a fixed number of independent execution threads and they do not share state. The disk-based storage engine also exploits the advantage of single-threaded requests and hence avoid unnecessary synchronization overhead. Tarantool also supports secondary indexes, asynchronous replication, and some SQL operations.
Tarantool uses write ahead logging (WAL), thus checkpoints are necessary to limit the log file size. In the docs, checkpoints are mentioned as snapshots. Users can either force the DBMS to take a snapshot, or enable automatic creation of snapshot files. Users can control the number of snapshots stored and the snapshot interval.
During a snapshot, copy-on-write and multi-version concurrency control techniques are used. When the master process changes part of a primary key, the corresponding page splits and the snapshot process obtains an old copy of the page. Hence taking a snapshot do not need to block.
Optimistic Concurrency Control (OCC)
Tarantool uses one single thread for processing all transactions of a database instance, which is called 'transaction processor thread'. Thus the design is lock-free. Transactions occur in fibers on that single thread. A fiber is a set of instructions that may contain yield
signals (yield can be either explicit or implicit, e.g., system calls). The transaction processor thread will execute all computer instructions until a yield, and then schedule a switch to another potentially ready fiber. This scheduling scheme is called cooperative scheduling. It means that unless a running fiber deliberately yields control, it cannot be preempted by other fibers. Thus, a transaction's author has the responsibility not to write long-running computations without a yield. There is also a 'network thread' that parses and ships messages, and a write ahead logging thread. While this design limits the number of cores that a DBMS can use, it removes competition for the memory bus and ensures high scalability of memory access and network throughput.
When transaction commits, a yield happens and changes are written to WAL. A simple optimistic scheduler is used: the first transaction to commit wins. Any active transaction that has read a value modified by a committed transaction will abort. Moreover, Tarantool's cooperative scheduler implementation ensures that, in absence of yields, a multi-statement transaction is not preempted and thus will never be aborted.
The basic data unit is a tuple, composed of fields. A tuple means a 'row' or 'record'. Tuples must have a primary index, and can have secondary indexes (can be non-unique). Fields are similar to regular 'record fields', except that (1) they can be composite structures, (2) they do not need to have names. Any tuple may have an arbitrary number of fields, and the fields may be of different types. Tuples are stored as MsgPack arrays.
A space is a container for tuples, and a space should have a unique identifier and a designated storage engine.
B+Tree Hash Table BitMap R-Tree
Tarantool has two storage engines: (1) memtx, the in-memory storage engine (2) vinyl, the on-disk storage engine. The in-memory storage engine memtx is the default engine and first to be developed.
Memtx engine's supported indexes are TREE, HASH, RTREE and BITSET.
Vinyl only supports TREE index.
Custom API Stored Procedures Command-line / Shell
Tarantool is incorporated with an application server, and provides a command-line console. The native language to use it to write applications is by Lua, but languages like C/C++/Python are also supported. Tarantool supports triggers in Lua and stored procedures in Lua/C. Starting from 2.0 (currently beta release), Tarantool supports some basic SQL operations.
The in-memory storage engine memtx is the default engine. The disk-based storage engine Vinyl can be used when data cannot fit in memory, but it lacks some functions and options that are available with memtx.
Vinyl's underlying data structure is log-structured merge-trees (LSM trees). Vinyl is different to common libraries like RocksDB in that it utilizes the DBMS property that transactions execute in a dedicated thread. Thus it allows it to remove the unnecessary locks, interprocess communication, and other overhead.
Tarantool supports asynchronous replication, either locally or on remote hosts. Tarantool supports both master-replica and master-master configurations.
In master-replica configuration, replicas can only serve reads. A replica gets synchronization from the master by continuously fetching and applying the write ahead log (WAL).
In master-master configuration, any node can handle both read and write requests. Tarantool only guarantees that each change on a master is propagated to all nodes and is applied only once. However, changes from different masters can be mixed and applied in a different order on different nodes.
https://github.com/tarantool/tarantool
https://tarantool.io/en/doc/1.10/
Tarantool
2008