HyperDex

View Current Viewing Revision #5 from 05/14/2019 4:14 p.m.

HyperDex is a distributed, searchable, and consistent key-value store. It is mainly developed by Robert Escriva in Cornell University. HyperDex provides serializability consistency and tolerates a threshold of server failures. Another highlight is a novel search primitive that supports query on secondary attributes. The main techniques HyperDex uses are hyperspace hashing and value-dependent chaining. HyperDex has a commercial extension called HyperDex Warp, which supports ACID transactions involving multiple objects. If not specified, "HyperDex" refers to the basic version without warp.

History

The development of HyperDex started in 2011. In the same year, it was publicly available and authors provided a report to show its performance. In 2013, HyperDex 1.0 is officially released with a fault-tolerant coordinator. In 2014, HyperDex supported ACID transactions involving multiple objects with the Warp add-on. However, both development and commercial support has stopped.

Data Model

Key/Value

HyperDex is a key-value store. Each object stored in it complies with an application-provided schema, consisting of a key and zero or more secondary attributes.

Storage Model

Custom

As mentioned in index, HyperDex applies hyperspace hashing to store data. It regards tables as multi-dimensional Euclidean spaces and each dimension corresponds to one attribute. An object is mapped to a position in the hyperspace by hashing every attribute value to a coordinate on the corresponding axis. Each server is assigned a region in the same hyperspace and is responsible to store the data fall within its region. Specifically, the hyperspace is divided into hyperrectangular regions and a fault-tolerance coordinator is responsible for mapping servers to regions. Since tables have different sets of attributes, HyperDex maintain multiple independent hyperspaces. Moreover, to address the problem that the hyperspace expands exponentially as the number of attributes increases. HyperDex partitions tables with many attributes into multiple smaller subspaces of fewer dimensions. Hyperspace hashing does not distinguish between key and other attributes. However, to support efficient key-based operation, HyperDex has a one-dimension subspace dedicated to the key.

Query Execution

Materialized Model

Client first maps a query into hyperspace and determines which servers’ regions intersect the resulting hyperplane. It them sends the request to those servers and collects matching results if necessary.

Concurrency Control

Optimistic Concurrency Control (OCC) Timestamp Ordering

Since HyperDex employs a novel storage model, the concurrency control scheme is also different from typical methods. Moreover, HyperDex and HyperDex Warp use different types of concurrency control. HyperDex uses timestamp ordering. There is a centralized coordinator that assigns version number to every update. All servers apply updates in the same order according to the version number. HyperDex Warp uses a new form of OCC called acyclic transaction, which provides serializability on top of distributed data store.

Joins

Not Supported

HyperDex does not support joins. Since different tables are mapped to different hyperspaces, records from different tables cannot be combined in HyperDex.

Storage Architecture

Disk-oriented

HyperDex does not explicitly state the storage architecture. However, the paper does imply disk-oriented architecture. HyperDex has embeddable storage layer called HyperDisk, which recursively apply hyperspace hashing technique in storage. A region mapped to a server is further partitioned into non-overlapping subregions and objects in each sub-region is stored in a file on disk.

Checkpoints

Consistent Blocking

HyperDex supports blocking, consistent checkpoints. While taking a backup, all servers will coordinate to capture a consistent snapshot. Service will pause for less than one second, waiting for all pending operations complete. Moreover, servers are replicated in each region to provide fault tolerance.

Isolation Levels

Serializable

HyperDex provides serializability for operations. All clients connected to the system are guaranteed to see operations in the same order. Specifically, a search will observe any update happen before it and return the latest version of data. To provide serializability consistency, HyperDex adopts a technique called value-dependent chaining. The system arranges all replicas of an object into a value-dependent chain, whose items are determined by hashing the object into subspaces (see Storage Model). Each update flows through the chain, and remains pending until the tail sends an acknowledgement back. Out-of-order updates is avoided with global version number. Therefore, it ensures that all operations are strictly ordered and reliably committed at all replicas before clients receiving responses.

Foreign Keys

Not Supported

HyperDex is a NoSQL DBMS that does not support foreign keys.

Query Interface

Custom API

HyperDex provides an imperative API. It supports two kinds of queries: 1) basic query to insert, retrieve, update and delete objects identified by keys. 2) search query to retrieve the objects whose attributes match user-specified range (any subset of attributes). HyperDex contains full client bindings for C, C++, and Python, and partial bindings for Java, Node.JS and Ruby.

System Architecture

Shared-Nothing

HyperDex supports distributed shared-nothing architecture. HyperDex contains three components: the coordinator, clients and storage servers. The coordinator is responsible for maintaining state for the system, such as region-server mapping, server failures, latest version number. Though it is logically centralized, it is implemented as a replicated machine. Clients sends requests directly to corresponding servers. Servers are deployed on cluster and each stores a piece of data. All messages are sent through TCP.

Logging

Not Supported

HyperDex does not support logging. The storage layer HyperDisk is itself log-structured, so there is no need for extra logging. To provide fault tolerance, it employs replicas to store one piece of data.

Indexes

Hash Table

HyperDex does not use traditional index. Instead, it applies a new hashing technique called hyperspace hashing. All attributes are mapped to dimensions in a multi-dimensional hyperspace and attribute values are hashed independently. Objects resides at the coordinate specified by the hashes. In order to support range search, attribute order is preserved with hashing. In sum, it is easy and efficient to insert, search, retrieve objects with this hyperspace hashing mapping.

Storage Organization

Log-structured

As mentioned in Storage Architecture, records are stored in files on disk. Files are log-structured. Updates such as insertion are recorded on the tail and search operation linearly scan the whole file.

Revision #5 | Updated 05/14/2019 4:14 p.m.

View Current Viewing Revision #5 from 05/14/2019 4:14 p.m.

Developer

Cornell University

Country of Origin

Start Year

2011

End Year

2016

Project Type

Academic, Open Source

Written in

C++

Supported languages

C, C++, Go, Java, Python, Ruby, Rust

Operating Systems

Linux, OS X

Licenses

BSD

Revision #5 | Updated 05/14/2019 4:14 p.m.

HyperDex

History

Data Model

Storage Model

Query Execution

Concurrency Control

Joins

Storage Architecture

Checkpoints

Isolation Levels

Foreign Keys

Query Interface

System Architecture

Logging

Indexes

Storage Organization

Website

Source Code

Tech Docs

Developer

Country of Origin

Start Year

End Year

Project Type

Written in

Supported languages

Operating Systems

Licenses