DBDB.io The Encyclopedia of Database Systems · Est. 2017
Database of Databases

Database Entry

Scylla


ScyllaDB is an open-source distributed wide-column NoSQL database offering high availability, scalability and fault-tolerance, all while maintaining predictable low latencies and high throughput. ScyllaDB is compatible with both Apache Cassandra (CQL, SSTables) and Amazon DynamoDB interfaces. Written in C++, ScyllaDB uses the highly asynchronous shard-per-core, shared-nothing Seastar framework (http://seastar.io/), where each thread executes on its own CPU core, memory, and multi-queue network interface controller. Cross-core communication is carried out by explicit asynchronous, message passing.[05]

Source Code
https://github.com/scylladb/scylladb[02]
Twitter
@ScyllaDB
Developer
Country of Origin
IL
Start Year
2014 [36]
Coding Agents
Project Types
Commercial, Open Source
Written in
C++
Supported Languages
C#, C++, Go, Java, JavaScript, PHP, Python, Ruby, Rust
Derived From
Cassandra
Inspired By
Cassandra, DynamoDB
Compatible With
Cassandra, DynamoDB
Operating System
Linux
License
AGPL v3

Database Entry

Scylla


ScyllaDB is an open-source distributed wide-column NoSQL database offering high availability, scalability and fault-tolerance, all while maintaining predictable low latencies and high throughput. ScyllaDB is compatible with both Apache Cassandra (CQL, SSTables) and Amazon DynamoDB interfaces. Written in C++, ScyllaDB uses the highly asynchronous shard-per-core, shared-nothing Seastar framework (http://seastar.io/), where each thread executes on its own CPU core, memory, and multi-queue network interface controller. Cross-core communication is carried out by explicit asynchronous, message passing.[05]

History[06][05][07]


ScyllaDB was founded in 2014 by an Israel startup (originally named Cloudius Systems), led by Avi Kivity and Dor Laor. The database project was first released as open source in 2015.

Checkpoints[08]


ScyllaDB supports non-blocking checkpoints through per-node backup procedures, which include full backup/snapshots and incremental backup. Snapshots are taken by the snapshot operation provided by the nodetool utility, while the incremental backup option can be configured in the configuration file. Automatic unnecessary backup cleaning is not implemented.

Compression[09][10][11]


ScyllaDB uses Apache Cassandra chunked compression on SSTable files. Three dictionary-based compression algorithms are provided: LZ4 (default), Snappy, DEFLATE and ZStandard. Data needs to be decompressed before being processed during query execution.

Concurrency Control[12][13][14]


ScyllaDB does not support ACID transactions as in RDBMS. However, CQL has a BATCH statement that allows multiple update statements belonging to a given partition key be applied in isolation (note that batches are not a full analogue for SQL transactions). Besides, in UPDATE, INSERT, and DELETE statements, modifications belonging to the same partition key are performed atomically and in isolation. ScyllaDB implements Multi-Version Concurrency Control (MVCC) for partition mutation. Internally, versions are represented by an ordered list of states, where each state is a delta of current mutation.

Further, ScyllaDB supports compare-and-set (CAS) and strict linearizability of operations using Lightweight Transactions (LWT), which uses an underlying Paxos algorithm implementation. As such, it is "ACID" compliant so long as operations are limited to one data partition.

Data Model[04][15]


ScyllaDB uses the same wide column NoSQL data model as Apache Cassandra, which represents data in a key-key-value format (like row in RDBMS). The first key is the partition key, with the second key being a clustering key used for sorting of rows within a partition. ScyllaDB organizes a collection of rows as a column family (like table in RDBMS). One or more column families are contained in a keyspace (like database in RDBMS). It is encouraged that one application should use one keyspace.

Foreign Keys[15]


ScyllaDB is a NoSQL database that does not support table JOINs or foreign keys.

Indexes[16][17]


ScyllaDB supports both primary key and secondary key indexes. For primary index, ScyllaDB hashes the key and finds the corresponding partition in the consistent hashing ring; within the partition, ScyllaDB finds the row in a sorted data structure (SSTable).

For secondary indexes, ScyllaDB maintains an index table for the secondary index keys, where the value for each key is the (primary) partition keys associated with the secondary key. Whenever a secondary index is queried, ScyllaDB first retrieves the partition keys using the secondary index, then retrieves the records with those partition keys returned by the first step. ScyllaDB supports both local (per node) as well as global (per cluster) secondary indexes.

Isolation Levels[18][14][19]


ScyllaDB does not support ACID transactions as in RDBMS. However, CQL has a BATCH statement that allows multiple update statements belonging to a given partition key be applied in isolation (note that batches are not a full analogue for SQL transactions). Besides, in UPDATE, INSERT, and DELETE statements, modifications belonging to the same partition key are performed atomically and in isolation.

As well, ScyllaDB supports CQL Light-Weight Transactions (LWT), which allow for compare-and-set (CAS) operations and strict linearizability using a Paxos consensus algorithm.

Furthermore, ScyllaDB also provides a DynamoDB-compatible interface, known as "Alternator." In Alternator, users can choose an isolation level per table.

Joins[20]


ScyllaDB is a NoSQL database; it does not support table JOINs. Each SELECT statement only applies on one single table.

Logging[21][22]


ScyllaDB writes a commitlog for each write request coming in. Before the mutation is applied to the Memtable, the commitlog that contains the data in the mutation is written to disk to guarantee the durability.

Parallel Execution[23][24][25][26]


ScyllaDB supports parallelism by being built on the Seastar framework (q.v.), which treats each different CPU core (or hyperthread on Intel CPU systems) as a unique "shard" of the database. The per-core shard is aligned with its own pool of NUMA-aware memory, and its own Sorted Strings Table (SSTable) files in storage for its associated data partitions. In this manner, ScyllaDB uses asynchronous programming techniques in a shared-nothing approach to avoid file locking and blocking of operations. Each database shard in ScyllaDB also includes its own internal CPU scheduler and IO scheduler to avoid having to rely on CPU scheduling and IO processing.

Because of ScyllaDB's token-aware drivers, the coordinator node for each operation is usually selected from one of the existing replicas of the data. Requests are handled by that replica directly, and are also automatically forwarded on to the other replica shards associated with the relevant data partitions.

Query Compilation


Query Execution[27][26]


ScyllaDB is designed as a shard-per-core architecture built on the highly asynchronous Seastar framework (q.v.), where all shards run in parallel. Full table scan queries on ScyllaDB are made parallel by using the partitioner and token function.

Query Interface[28][29]


Scylla uses Cassandra Query Language (CQL) as the Query Interface. Besides, drivers for the following languages are provided: C++, C#, Go, Java, Node.js, PHP, Python, Ruby, and Rust.

Storage Architecture[30][31]


Scylla is a disk-oriented DBMS, storing data in SSTable file. Scylla also supports in-memory tables, which reduces read latency for mostly read workloads.

Storage Model[32]


Scylla stores data in a sequence of rows.

Storage Organization[33]


Scylla stores data in Sorted-String Tables.

Stored Procedures


System Architecture[16][34]


Scylla uses a shared-nothing model. Nodes in the cluster are organized in a decentralized consistent hashing ring and data is partitioned into shards by the key across all nodes. Scylla uses a shard-per-core architecture, where each thread for a shard executes on its own CPU core, memory, and multi-queue network interface controller. Cross-core communication is carried out by explicit message passing. Scylla also uses replicas for fault-tolerance.

Views[35]


Scylla supports Materialized Views in version 2.0 as an experimental feature. Whenever the base table is updated, the materialized view table will be automatically updated. Materialized View tables are distributed as normal tables and scale as well as normal tables. However, there are still limitations in the current experimental release, including but not limited to lack of local locking and local batch log.

Citations

38 sources
  1. ScyllaDB For Real-Time AI scylladb.com
  2. GitHub - scylladb/scylladb: NoSQL data store using the Seastar framework, compatible with Apache Cassandra and Amazon DynamoDB · GitHub github.com
  3. Welcome to ScyllaDB Documentation | ScyllaDB Docs scylladb.com
  4. ScyllaDB - Wikipedia wikipedia.org
  5. ScyllaDB | Modern NoSQL Database Architecture scylladb.com
  6. scylladb/version.hh at master · scylladb/scylladb · GitHub github.com
  7. NoSQL Database Company for Data Intensive Apps - ScyllaDB scylladb.com
  8. Backup your Data | ScyllaDB Docs scylladb.com
  9. Compression in ScyllaDB, Part One - ScyllaDB scylladb.com
  10. Compression in ScyllaDB, Part Two - ScyllaDB scylladb.com
  11. SSTable Compression | ScyllaDB Docs scylladb.com
  12. scylladb/partition_version.hh at 8210f4c982396aba127cfc2511998c502bed39b5 · scylladb/scylladb · GitHub github.com
  13. Data Manipulation | ScyllaDB Docs scylladb.com
  14. Getting the Most out of Lightweight Transactions in ScyllaDB - ScyllaDB scylladb.com
  15. Data Definition | ScyllaDB Docs scylladb.com
  16. ScyllaDB Ring Architecture - Overview | ScyllaDB Docs scylladb.com
  17. Global Secondary Indexes | ScyllaDB Docs scylladb.com
  18. Data Manipulation | ScyllaDB Docs scylladb.com
  19. Upgrade ScyllaDB | ScyllaDB Docs scylladb.com
  20. Data Manipulation | ScyllaDB Docs scylladb.com
  21. scylladb/db/commitlog/commitlog_entry.hh at 1891779e646d770aa825ecdd8f3a8e60847d3ca9 · scylladb/scylladb · GitHub github.com
  22. ScyllaDB scylladb.com
  23. What We’ve Learned after 6 Years of IO Scheduling - ScyllaDB scylladb.com
  24. Implementing a New IO Scheduler Algorithm for Mixed Read/Write Workloads - ScyllaDB scylladb.com
  25. Maximizing Performance via Concurrency While Minimizing Timeouts in Distributed Databases - ScyllaDB scylladb.com
  26. Seastar - Seastar seastar.io
  27. Efficient full table scans with ScyllaDB 1.6 - ScyllaDB scylladb.com
  28. ScyllaDB scylladb.com
  29. ScyllaDB CQL Drivers | ScyllaDB Docs scylladb.com
  30. ScyllaDB scylladb.com
  31. ScyllaDB SSTable Format | ScyllaDB Docs scylladb.com
  32. SSTable Data File | ScyllaDB Docs scylladb.com
  33. ScyllaDB scylladb.com
  34. ScyllaDB Architecture - Fault Tolerance | ScyllaDB Docs scylladb.com
  35. Materialized Views preview in ScyllaDB 2.0 - ScyllaDB scylladb.com
  36. Initial commit github.com
  37. https://github.com/scylladb/scylladb/commit/5372c034dc4c1070f29ed52c13c05391ec29e9c4 github.com
  38. https://github.com/scylladb/scylladb/commit/817c3fb0659dc5718a531de5acc55c22bc607aab github.com
Revision #17 Last Updated: