Hypertable

Hypertable is a high performance, open source, massively scalable database modeled after Bigtable, Google's proprietary, massively scalable database. Hypertable runs on top of a distributed file system such as the Apache HDFS, GlusterFS or the CloudStore Kosmos File System (KFS). It is written almost entirely in C++ as the developers believed it had significant performance advantages over Java.

History

Hypertable software was originally developed at the company Zvents before 2008. Doug Judd was a promoter of Hypertable. In January 2009, Baidu, the Chinese language search engine, became a project sponsor. A version 0.9.2.1 was described in a blog in February 2009. Development ended in March 2016.

Data Model

Column Family / Wide-Column

Hypertable uses a set of related columns known as a column family. Users may supply an optional column qualifier and specify the qualified column as family:qualifier.

Query Interface

Custom API Command-line / Shell

The Hypertable Query Language (HQL) allows you to create, modify, and query tables and invoke administrative commands. HQL is interpreted by the following interfaces: - The hypertable command line interface (ht shell), - The hql_exec and hql_query Thrift API methods, - The Hypertable::HqlInterpreter C++ class.

Concurrency Control

Multi-version Concurrency Control (MVCC)

The system uses Multi-Version Concurrency Control (MVCC) and by default will auto-assign revision numbers using a timestamp.

Storage Model

Custom

Storage Architecture

Disk-oriented

Hypertable is capable of running on top of any filesystem. To achieve this, the system has abstracted the interface to the filesystem by sending all filesystem requests through a File System (FS) broker process. FS brokers have been developed for HDFS, MapR, Ceph, KFS, and local (for running on top of a local filesystem).

System Architecture

Shared-Disk

Hypertable is capable of running on top of any filesystem. All the Range Servers of the Hypertable share the same filesystem.

Isolation Levels

Snapshot Isolation

Timestamp is used internally to provide snapshot isolation for queries.