SingleStore is a distributed, cloud-native database that can handle transactional and analytical workloads with a unified engine. It is a modern SQL DBMS and cloud service that supports multiple data models, including structured data, semi-structured data based on JSON, time-series, full text, spatial, and vector data.
SingleStore takes a form of checkpoint called a snapshot. Snapshots are performed periodically and they contain a copy of all in-memory rowstore data. To recover the database after a crash or restart, the latest snapshot is read and deserialized into memory, and then the log files are played back from the start time of snapshot creation to the current time.
Multi-version Concurrency Control (MVCC)
SingleStore uses multi-version concurrency control (MVCC) and lock-free data structures. Read operations are not blocked, and write operations acquire row-level locks. Row locks are acquired as rows are written to and are held until the transaction that acquired them commits or rolls back, using 2-phase locking to ensure serializability. The distributed query optimizer evenly distributes the processing workload to maximize the efficiency of CPU usage, and query plans are compiled to machine code and cached to expedite subsequent executions.
Relational Key/Value Document / XML Object-Oriented Multi-Value Vector
SingleStore is a multi-model database system. Its primary data model is relational. It also supports semistructured (JSON), vector, time series, geospatial, key-value and full-text models. It supports a SQL interface and also a NoSQL interface called Kai that is largely compatible with the Mongo™ API. Semistructured data access can be done through SQL (using the JSON type) or Kai.
Nested Loop Join Hash Join Sort-Merge Join Broadcast Join
Nested loop join, index-nested loop join, merge join and hash join are supported in SingleStore. For distributed join queries, if two tables are joined with identical shard key, the join will be performed locally; otherwise the dataset is broadcast or reshuffled to other nodes via the network.
Code Generation JIT Compilation
Instead of the traditional interpreter-based execution model, SingleStore comes with a new code generation architecture, which compiles a SQL query to LLVM to machine code. When the SingleStore server encounters a SQL query, it parses SQL into AST and extracts parameters from the query, which is then transformed into a SingleStore-specific intermediate representation in SingleStore Plan Language (MPL). SingleStore then flattens MPL AST into a more compact format as SingleStore Bytecode (MBC). Plans in MBC format are then transformed into LLVM Bitcode, which LLVM uses to generate machine code. Such code generation architecture enables many low-level optimizations and avoids much unnecessary work compared to interpreter-based execution. Compiled plans are also cached on disk for future use.
Tuple-at-a-Time Model Vectorized Model
SingleStore uses Tuple-at-a-Time Model for rowstore query execution and a Vectorized Model for columnstore query execution. Plans are compiled to machine code (see Query Compilation).
Disk-oriented In-Memory Hybrid
SingleStore features Universal Storage which is an evolution of the columnstore, accommodating transactional workloads that would have traditionally been managed by the rowstore. Universal Storage combines rowstore and columnstore to support both Online Transaction Processing (OLTP) and Hybrid Transactional and Analytical Processing (HTAP) workloads at lower total cost of ownership (TCO). Designed to enhance both parallelism and fault tolerance, databases in SingleStore are divided into partitions, also referred to as shards, that are evenly distributed among the available leaf nodes. Each partition holds a subset of data based on the SHARD KEY defined in the CREATE TABLE statement.
Decomposition Storage Model (Columnar) N-ary Storage Model (Row/Record) Hybrid
In SingleStore, tables are broken into million-row chunks called segments. Row segments in rowstore are stored in-memory. Column segments in columnstore are stored on disk and external object storage. Each columnstore partition has an in-memory rowstore segment holding recently updated or inserted data. Columnstore tables are kept sorted by a sort key and several types of compression are applied for columnstore data including value encoded, RLE, and dictionary encoding.
SingleStore has a two-tier, clustered architecture. The nodes in the upper tier are aggregators, which are cluster-aware query routers. One special node called the Master Aggregator is responsible for cluster monitoring. The nodes in the lower tier are leaves, which store and process partitions (shards). The aggregator sends extended SQL to leaves to perform distributed query execution.
SingleStore Inc
2011
MemSQL