SWC-DB

SWC-DB ("Super Wide Column Database") is a wide-column DBMS. SWC-DB does not have Tables nor Namespaces; the number of columns is numerous with a support of a selector over column name and tags, the actual Key is a structure of Fractions and binary(bytes) data is supported by the Cell's (record's) Key or Value. SWC-DB can be described as key-value store model with support of numerous Fractions makeup the Key. Like the wide-column database known to support of a single column-qualifier the Key-Fractions in SWC-DB let use of 2^24 qualifiers(-1 "row") for a single record-key that consist the joined fractions. A Fraction limit is 2^24 in length and total key is 2^32 bytes. Regardless the key-value store as perception SWC-DB like wide-column databases has support for scanning and selecting cells on intervals, the extended support of scan includes scanning of multiple Columns with multiple Intervals in a Column. The master & meta columns in SWC-DB (the range locators system-columns) index/store additionally to the key-begin and key-end of a range, the minimal and maximum values of the fractions in an aligned manner that tackle the range-locator latency and access/scan multi-dimensional key letting to select F(N) without the need for scanning F(0). SWC-DB like no other database has support for several key-sequences, the column schema define the key byte sequence options Lexical (0,1,10,9) or Volumetric (0,1,9,10) or over Fractions-Count in a key (less to more). The Cell Value in SWC-DB is defined in column schema and current column-type options are PLAIN, SERIAL and COUNTER_(I8,I16,I32,I64). The possibilities of scanning over value-data depends on the column-type. All the conditions of Cell's key, Cell's value, Column's name and Column's tag with SWC-DB are done with types of Comparators that are extended to domain-object types such as partially/fully order sup/subset plus extension to volumetric on lexical. SWC-DB uses a proprietary SQL (structural query language) that suit the requirements to work with it's super wide-key. SWC-DB designed to handle Yottabytes+ on a quadrillion base of entries.

Query Interface

Custom API SQL Command-line / Shell

The main interface in SWC-DB is the "libswcdb.so" C++ Client/API library which is as well inner-consumed for SWC-DB internal operations, utilities applications and Broker-server applications. One of the utilities-application is the Command Line Interface "SWC-DB(client)>" that let the majority of commands and queries executed in form of SQL. As a supporting API implementation layer, SWC-DB implements the known Apache Thrift-Protocol and provides the "swcdbThriftBroker" application to serve the thrift-client requests, that can be developed with C++, Java, Python, C-Glib, Netstd, Rust and extends to the supported languages by Apache-Thrift.

Concurrency Control

Multi-version Concurrency Control (MVCC) Deterministic Concurrency Control Timestamp Ordering

The option to define the maximum cell-versions to keep/retain in SWC-DB is done by configuring the column's Schema property "cell versions", whereas the order and the revision of a cell is determined by the Cell's definitions of timestamp and it's order DESC/ASC. The timestamp defaults to AUTO that is swcdbRanger application system's time and order to ASC. Additionally to the user-supplied timestamp SWC-DB cell has an internal cell-revision definition that let to determine the latest data that should be retain or removed. The precision of timestamp used in SWC-DB is in nanoseconds. Whereas the configurable 'cell ttl' property of a column schema is in milliseconds. The Cell's TTL of the schema is applied against timestamp of a cell and at reach of TTL the cell is auto-considered as non-existing, depends on momentum and max-versions, by changing the Schema's cell-ttl before Compaction has under-gone It is a possibility the Cell to exist.

Indexes

Skip List Log-Structured Merge Tree Block Range Index (BRIN)

A Column-Range(a shard) in SWC-DB is a structure of CommitLog-Fragments, CellStores and Blocks that are indexed in a Block Range Index (BRIN); The Range-Blocks for faster Block access have a supporting Skip-List narrowing index; The underlying objects storage of the Current-CommitLog, Log-Fragment, CellStore-Block and Range-Block is the SWC-DB Mutable-Cells container with Skip-List Index capabilities. The SWC-DB Range-Blocks-Block (the object against which the select scans are performed) cells-data is populated by the SWC-DB Range-Block-Loader which reads/loads the CellStore-Blocks, Log-Fragments and Current-CommitLog. The Block loading in the corresponding order undergo indexing procedure of the Log-Structured Merge Tree; Continuously as required by `swc.rgr.Range.CommitLog.Compact.cointervaling` configuration the CommitLog Compaction merges/compacts the Log-Fragments on an index-base of Log-Structured Merge Tree.

Storage Architecture

Disk-oriented

The File-System of SWC-DB is configurable and on current date supports Hadoop-JVM, Hadoop(Native C++), Ceph, Local, Custom (a user developed library) and FsBroker. Depends on SWC-DB release/build-type, filesystem/s can be an implementation or to be a dynamically loaded library. The FsBroker-Filesystem for SWC-DB applications is a filesystem-type whereas swcdbFsBroker(application) has an underlying fs-type configuration. The swcdbFsBroker can be running on remote instances to the hosts running the swcdbManager/s or/and swcdbRanger/s. The Custom-Filesystem is a type name-holder for a user developed File-System library based on SWC::FS::FileSystem interface class (limitless of FS Impl. possibilities). The SWC-DB base FS path is determined and defined by the configuration-properties `--swc.fs.TYPE.path.root` and `--swc.fs.path.data` and it is evaluated as `swc.fs.TYPE.path.root="/var/opt/swcdb/"` joined with `swc.fs.path.data="your/cluster/name"`. As example to have FS-base at root folder of HDFS it is `swc.fs.hadoop_jvm.path.root=/swcdb/` plus joined with `swc.fs.path.data="or/no/sub/folders"`. The data-topology(files/folders structure) within the SWC-DB base-path is based by column-id and range-id, that on path consist CellStores and CommitLog files while at any point one server is responsible for a range-id on column-id and of a path root. The CellStores are files storing Cells in serialized form that are after latest compaction whereas Commit-Log is many Fragments of current added data, the fragments are written on a (schema configurable log-rollout-ratio) threshold reach or on shutdown. SWC-DB uses structure ID folder-names and a single-folder won't have more than 2000 entries a folder.

System Architecture

Shared-Nothing

Compression

Naïve (Page-Level) Naïve (Record-Level) Bit Packing / Mostly Encoding

The available encoders (compressors) are ZSTD, ZLIB, SNAPPY and in PLAIN. On a record-level base configuration it is done by updating Cell-Value with the desired encoder. On a page-level base that is applied at Cell-Store-Block it is done by defining the default configuration for a schema or by column-schema definition with the encoder to use. Transactions of Communications are configurable at a program specific (application level) with an application designated configuration property and file. Variable-Length integers are used for most of numerical primitives either long-term/persistent storage or intermediate run-time consumption.

SWC-DB Logo
Website

https://www.swcdb.org

Source Code

https://github.com/kashirin-alex/swc-db

Tech Docs

https://github.com/kashirin-alex/swc-db/tree/master/docs

Developer

Kashirin Alex

Country of Origin

IL

Start Year

2019

Project Type

Commercial, Open Source

Written in

C++

Supported languages

C, C#, C++, Java, Python, Rust

Inspired By

Hypertable

Operating Systems

Linux

Licenses

GPL v3, Proprietary