HyperGraphDB

HyperGraphDB is an extensible open-source graph-based data storage engine. It implements the ability to store hypergraph relationships, which make it suitable for complex data and knowledge representation problems. It relies on object called an atom as its unit storage, where an atom is either a node in the graph or an edge that can point to multiple nodes and/or edges. It uses BerkeleyDB as its key value store, and operates as an object-orientated Java database.

History

Originated from an AI project (http://www.opencog.org) in 2007. It's inspired by the paper "Directed Recursive Labelnode Hypergraphs: A new Representation-Language", by Harold Boley, because most AI applications learn a higher-order, general representation, that can not be described in an ordinary graph but rather expressed in a hypergraph.

Data Model

Graph

Each tuple is called an atom, which is either a node or an edge that points to multiple nodes or edges. Each of these atoms are strongly typed, such that there is a specific API that you can call to access that atom's contents. This DBMS uses BerkeleyDB as its key-value storage.

Indexes

B+Tree

HyperGraphDB maintains 3 main indexes for efficiency: mapping from atom to its incidence set, mapping an atom to the set of all its atoms it is pointing to and the mapping of an atom to the value store. Users of the DBMS can implement their own indices as long as they define a mapping from atoms to atoms.

Storage Model

Custom

This DBMS supports both n-ary relationships and higher-order relations, which are edges that can point to multiple nodes at a time, and edges that can point to other edges. Each of these objects, edges or nodes, are referred to as atoms, and can be thought of as Java Objects.

Logging

Logical Logging

HyperGraphDB implements a logical log that stores the type of operation (i.e. Create, Update, Remove, Copy), the value that was updated or stored, and its timestamp. These then get saved to persistent memory after being created.

Parallel Execution

Intra-Operator (Horizontal)

HyperGraphDB allows for unions to be parallelized by running the two graph nodes that want to be joined across threads.

Concurrency Control

Multi-version Concurrency Control (MVCC) Timestamp Ordering Two-Phase Locking (Deadlock Detection)

HyperGraphDB is Atomic, Consistent, and Isolated but not Durable. It implements the Multi-Version Concurrency Control, and is fully transactional. To ensure consistency in peer to peer transactions, they use timestamps to make sure the queries are executed in order. HyperGraphDB also implement automatic deadlock detection that will randomly kill a process if a deadlock is encountered.

Query Execution

Tuple-at-a-Time Model

They implement an iterator API that processes a tuple at a time to allow for graph traversals.

Query Interface

Custom API

HypergraphDB supports a special API specific to their tuples called atoms, but it doesn't support any particular querying language. They support three types of queries: graph traversals, predicate matching, and pattern matching over graph structures.

Storage Architecture

Disk-oriented

HyperGraphDB requires an efficient key-value store that can support multiple ordered values per a single key. They use the BerkeleyDB storage system as their physical storage, which is disk-orientated.

System Architecture

Shared-Nothing

HyperGraphDB is a distributed, Shared-Nothing database that communicates via a P2P protocol. The protocol followed is of the Agent Communication Language (ACL) FIPA standard, which consists of communication using keywords such as propose, accept, inform, request, query etc.

Joins

Sort-Merge Join

Joins are performed on two sorted sets of atoms and the two are joined via a zig-zag algorithm or a merge algorithm.