HyperGraphDB

View Current Viewing Revision #9 from 12/15/2019 7:18 p.m.

HyperGraphDB is an extensible open-source graph-based data storage engine. It implements the ability to store hypergraph relationships, which make it suitable for complex data and knowledge representation problems. It relies on object called an atom as its unit storage, where an atom is either a node in the graph or an edge that can point to multiple nodes and/or edges. It uses BerkeleyDB as its key value store, and operates as an object-orientated Java database.

History

Originated from an AI project (http://www.opencog.org) in 2007. It's inspired by the paper "Directed Recursive Labelnode Hypergraphs: A new Representation-Language", by Harold Boley, because most AI applications learn a higher-order, general representation, that can not be described in an ordinary graph but rather expressed in a hypergraph.

Data Model

Graph

Each tuple is called an atom, which is a subset of a graph (set of nodes and edges). Each of these atoms are strongly typed, such that there is a specific API that you can call to access that atom's contents. This DBMS uses BerkeleyDB as its key-value storage.

Logging

Logical Logging

HyperGraphDB implements a logical log that stores the type of operation (i.e. Create, Update, Remove, Copy), the value that was updated or stored, and its timestamp. These then get saved to persistent memory after being created.

System Architecture

Shared-Nothing

HyperGraphDB is a distributed, Shared-Nothing database that communicates via a P2P protocol. The protocol followed is of the Agent Communication Language (ACL) FIPA standard, which consists of primitive keywords such as propose, accept, inform, request, query etc.

Query Execution

Tuple-at-a-Time Model

They implement an iterator API that processes a tuple at a time to allow for graph traversals.

Concurrency Control

Timestamp Ordering Two-Phase Locking (Deadlock Detection)

HyperGraph is ACI, but not Durable. To ensure consistency in peer to peer transactions, they use timestamps to make sure the queries are executed in order. They also implement automatic deadlock detection that will randomly kill a process if a deadlock is encountered.

Joins

Sort-Merge Join

Joins are performed on two sorted sets of atoms and the two are joined via a zig-zag algorithm or a merge algorithm.

Parallel Execution

Intra-Operator (Horizontal)

HyperGraphDB allows for unions to be parallelized by running the two graph nodes that want to be joined across threads.

Storage Model

Custom

This DBMS supports both n-ary relationships and higher-order relations, which are edges that can point to multiple nodes at a time, and edges that can point to other edges. Each of these objects, edges or nodes, are referred to as atoms, and can be thought of as Java Objects.

Indexes

B+Tree

HyperGraphDB implements 3 main indices for efficiency: mapping from atom to its incidence set, mapping an atom to the set of all its atoms it is pointing to and the mapping of an object to the atoms that actually store the value. Users of the DB can implement their own indices as long as they associate atoms with atoms.

Storage Architecture

Disk-oriented

HyperGraphDB requires an efficient key-value store that can support multiple ordered values per a single key. They use the BerkeleyDB storage system as their physical storage.

Query Interface

Custom API

HypergraphDB supports a special API specific to their tuples called atoms, but it doesn't support any particular querying language. They support three types of queries: graph traversals, predicate matching, and pattern matching over graph structures.

Revision #9 | Updated 12/15/2019 7:18 p.m.