Elliptics

Elliptics network is a fault-tolerant distributed key/value database system. With default key generation policy it implements has table object storage. Comparing to traditional database systems, it is more likely to be a distributed network framework to make servers working together.

History

Elliptics was initially created in 2007 as part of POHMELFS v1. POHMELFS is the abbreviation of the Parallel Optimized Host Message Exchange Layered File System, which is a cache-compatible distributed file system developed by Russian Linux-hacker Evgeny Polyakov. It could be viewed as a protocol to share files between file systems on computers via LAN. In 2009 Elliptics seperated from POHMELFS and became a consistent distributed storage system later. As of 2014, the Elliptics was used in Yandex Map, Disk, Music, Photos and some infrastructure.

Storage Architecture

Hybrid

Storage architecture is named as Backends in Elliptics. Elliptics has three low-level backends: filesystem (where written objects are stored as files), Eblob (fast append-only storage) and Smack (small compressible objects stored in sorted tables).

Moreover, Elliptics implemented both the generic storage protocol and its own specific protocol. Therefore, data stored in other services can be routed to Elliptics. For example, Elliptics can connect to MySQL servers and trigger some special commands to read/write data into Elliptics.

System Architecture

Shared-Disk

The whole scheme can be found form Elliptics documentation http://doc.reverbrain.com/elliptics:architecture-scheme.

The whole system is more likely to be a web application support services. It contains several layers, including client-side web application, proxy-server, frontends, system core, and backends.

Backends are the data storage level, which contains system key-values data or links to the external database. Frontends are tools or drivers level. Elliptics core manages distributed nodes to execute the command from clients.

Isolation Levels

Repeatable Read

There are two types of Isolation using in Elliptics: Process and CGroup.

And Elliptics uses eventual consistency model to maintain data replicas, which means the data in a group may not maintain the same at any time, but they will eventually be synced with others sometimes in the future.

Concurrency Control

Not Supported

Elliptics supports basic lock operation on key(ID). It can be forced not to use lock by specifying NOLOCK flag in the EXEC command. The lock operation is controlled by the client command instead of the backend system.

Due to no 2PL, MVCC or OCC existing and other deadlock prevention mechanism, there may exist deadlock inside the system.

Joins

Not Supported

As a NoSQL, key-value database, it does not implement join operation in system level.

Indexes

Red-Black Tree

The index structure that exposed to the client is named secondary indexes. Its implementation using STL std::map<> template in C++, which is usually implemented as a Red-Black tree structure.

Currently, secondary indexes support two ways to find the items:

  1. AND operation, which finds all objects meeting all of the provided indexes
  2. OR operation, which finds all objects meeting at least one of the provided indexes

Logging

Physical Logging

Elliptics uses replication to ensure data availability form the beginning of its design. To use replication features, a group of servers are bound together by admin and make the replications every time.

For the implementation of the logging, it's using the blackhole logging library for writing logs. The blackhole library is an attribute-based logger with maximum performance optimization. Elliptics uses blackhole library and can output to File/Syslog/Socket.

Query Interface

Custom API Command-line / Shell

The API is designed to support C, C++ and Python.

For current version Elliptics document, only the link to Python API works correctly. The links to C API and C++ API documents are broken. We can retrieve the archived version from archive.org. The version archive.org provided is on date May 11, 2016. Here is the link to C API and C++ API.

Python APIs are designed to config Elliptics client, including Logger, Config, Node, Session, etc. C++ APIs are designed to configure the client side Elliptics as well, but with less APIs than Python API library. Node is the main controlling structure of Elliptics.

C APIs could be used to config both Client and Server, with the functionalities of creation, configuration, server-side processing, cache and backend. For the client side, everything is built on the asynchronous API model, while we can do both synchronous and asynchronous calls in server side.

Data Model

Key/Value

It mainly supports key-value data model.