Clustrix is a clustered relational DBMS designed to be a a drop-in MySQL replacement. It is ACID compliant and provides scalability and fault tolerance. It is a scale-out database built for high-value, high-transaction OLTP applications on cloud and on premise. It uses automatic data distribution, a sophisticated query planner, and a distributed execution model along with MVCC and PAXOS to provide scalability, concurrency and fault tolerance.
Paul Mikesell after working on EMC Isilon and Sergei Tsarev after working on a time-series database for AOL, came together to work on their common idea of a database that could handle transactions and also scale which led to Clustrix. Both of them started to build the database from scratch and decided to make the system MySQL compatible but add better ways of distributing and replicating data in the cluster. They worked with various SSD manufacturers to profile their devices and eventually started shipping a Clustrix Database Appliance in 2010. With cloud computing in the picture and commodity hardware quality improving, they released the first software version, ClustrixDB 5.0 in 2014 and discontinued the appliance.
The Clustrix company was acquired by MariaDB in 2018. The product was renamed to MariaDB Xpand in 2022.
Every 500 milliseconds, a global periodic process running on one of the nodes(chosen when a new group is created) selects the minimum of commit IDs across all nodes to be the globc value. Then, a global checkpoint transaction with the selected globc value is executed marking all prior commits durable and establishing a point in the transaction timeline to which all nodes are able to revert, while still remaining consistent with respect to each other.
Multi-version Concurrency Control (MVCC) Two-Phase Locking (Deadlock Prevention)
It uses the two phase locking protocol for writer-writer conflict resolution. It has a distributed lock manager to scale write accesses on hot tables, the lock information is spread across all nodes. The database also has a concept of row level locks which locks only a subset of rows(size is configurable at runtime) instead of the whole table. If a transaction uses most of the rows on the table, the query optimizer promotes it to a table lock.
Read Committed Serializable Repeatable Read
The default isolation level is a modified version of ANSI Repeatable Reads, allows a per-transaction consistent snapshot read. The other levels supported are serializable and read committed.
Shared-nothing architecture was chosen so that the database can scale. Each node can perform any read or write. The architecture has three main parts to it:
1. The global transaction manager(GTM): A node in the cluster which coordinates the processing of the transaction in the cluster. On this node, the database engine parses and compiles the SQL statements into query fragments and comes up with an optimal distributed query plan. The fragments are distributed according to the plan and the GTM is responsible for collecting the results and returning it back to the user.
2. The Rebalancer: This is patented technology used by ClustrixDB to distribute data throughout the cluster. The system uses consistent hashing and splices the tables based on multiple indexes into different shards and distributes it across the cluster.
3. The Sierra Database Engine: This handles query planning and execution. It has two parts: the planner comes up with an optimal distributed query plan and the execution engine compiles the query into small executable query fragments and is responsible for executing those queries.