HyperGraphDB

HyperGraphDB is an extensible open-source graph-based data storage engine. It implements the ability to store hypergraph relationships, which make it suitable for complex data and knowledge representation problems. It relies on object called an atom as its unit storage, where an atom is either a node in the graph or an edge that can point to multiple nodes and/or edges. It uses BerkeleyDB as its key value store, and operates as an object-orientated Java database.

History

Originated from an AI project called OpenCog in 2007. This project aims to build an open source framework for AGI, and HyperGraphDB is a component of the project that will handle long-term storage of complex representations. HyperGraphDB is inspired by the paper "Directed Recursive Labelnode Hypergraphs: A new Representation-Language", by Harold Boley, because most AI applications learn a higher-order, general representation, that can not be described in an ordinary graph but rather better expressed in a hypergraph.

Concurrency Control

Multi-version Concurrency Control (MVCC) Timestamp Ordering Two-Phase Locking (Deadlock Detection)

HyperGraphDB is Atomic, Consistent, and Isolated but not Durable. It implements the Multi-Version Concurrency Control, and is fully transactional. To ensure consistency in peer to peer transactions, they use timestamps to make sure the queries are executed in order. HyperGraphDB also implement automatic deadlock detection that will randomly kill a process if a deadlock is encountered.

Query Interface

Custom API

HypergraphDB supports a special API specific to their data model, atoms, but it doesn't support any particular querying language. They support three types of queries: graph traversals, predicate matching, and pattern matching over graph structures.

Logging

Logical Logging

HyperGraphDB implements a logical log that stores the type of operation (i.e. Create, Update, Remove, Copy), the value that was updated or stored, and its timestamp. They are saved to persistent memory after being created.

Joins

Sort-Merge Join

Joins are performed on two sorted sets of atoms, and they are joined via a zig-zag algorithm or a merge algorithm.

Storage Model

Custom

This DBMS supports both n-ary relationships and higher-order relations, which are edges that can point to multiple nodes at a time, and edges that can point to other edges. Each of these objects, edges or nodes, are referred to as atoms, and can be thought of as Java Objects.

Indexes

B+Tree

HyperGraphDB maintains 3 main indexes for efficiency: mapping from atom to its incidence set, mapping an atom to the set of all its atoms it is pointing to and the mapping of an atom to the value store. Users of the DBMS can implement their own indices as long as they define a mapping from atoms to atoms.

Data Model

Graph

Each tuple is called an atom, which is either a node or an edge that points to multiple nodes or edges. Each of these atoms are strongly typed, such that there is a specific API that you can call to access that atom's contents. This DBMS uses BerkeleyDB as its key-value storage.

Storage Architecture

Disk-oriented

HyperGraphDB requires an efficient key-value store that can support multiple ordered values per a single key. They use the BerkeleyDB storage system as their physical storage, which is disk-orientated.

System Architecture

Shared-Nothing

HyperGraphDB is a distributed, Shared-Nothing database that communicates via a P2P protocol. The protocol followed is of the Agent Communication Language (ACL) FIPA standard, which consists of communication using keywords such as propose, accept, inform, request, and query.

Query Execution

Tuple-at-a-Time Model

This DBMS implements an iterator API that processes a tuple at a time to allow for graph traversals.

Parallel Execution

Intra-Operator (Horizontal)

HyperGraphDB allows for unions to be parallelized by running the two graph nodes that want to be joined across threads.