GrapheekDB

View Current Viewing Revision #5 from 12/16/2019 8:28 a.m.

GrapheekDB is a lightweight graph database with support for multiple back-end storage managers. It only represents directed graphs and is persistent if the chosen data model is a Key/Value store.

History

GrapheekDB was developed in 2014 by Raphaël Braud, a freelance developer from France. It was built for a recommendation system to extract the contents of documents, tokenizing their contents, and give recommendations of similar documents based on user queries. A graph database was chosen over a relational database to avoid multiple joins on tables of several million rows to improve performance. It was built with a specific purpose of recommending documents and has a python-like API (close to Django and Gremlin).

Foreign Keys

Not Supported

System Architecture

Shared-Memory

The database uses a client-server model and runs on TCP, port 5555. It can be used as a pure-in memory database but is targeted to be used with a persistent backends such as KyotoCabinet or LmDB.

Compression

Naïve (Page-Level)

The Naive Page Rank compression algorithm is listed as one of the todo items in the source-code but is not yet supported.

Storage Model

Custom

GrapheekDB is a multi model document store. The nodes and edges can have related data, but this is not enforced.

Checkpoints

Not Supported

Joins

Not Supported

A graph database does not need join operations as they are expensive. The DBMS is also schemaless.

Query Interface

Custom API Gremlin

The Query interface is close to Germlin and Django frontend. The DBMS has methods for lookups on graphs that resemble Django lookups and methods for path traversals for inner and outer vertices and edges that resemble Germlin traversal methods. The DBMS also has aliasing and collecting methods as well as aggregation methods such as count and sum which are implemented using python's entity iterators.

Indexes

Hash Table

While a graph database is index-free as it consists of direct pointers to its adjacent elements (a property known as adjacency), GrapheekDB does not need an index to find node and edge indices. However, the latest version of the DBMS does support nodes and edge indices for lookups on sparse graphs. The current version only supports "exact match indices" and performs a Depth-First-Search (DFS) in order to match indices. Storing the indices leads to a storage overhead and slows down writes in the DBMS.

Data Model

Key/Value Graph

The DBMS is a multi-model document store. Presently it can either be a graph or Key/Value Store (KVS). The DBMS uses many KVS backends such as Kyoto Cabinet and Symas LMDB. If a KVS backend is used, the DBMS becomes object persistent. There are no strict assertions on data modelling.

Storage Architecture

In-Memory

The DBMS uses in memory storage to store the graph.

Logging

Not Supported

Isolation Levels

Serializable

The DBMS was built with serializable execution in mind. This was done to avoid loading the entire data in memory every time the intended recommendation algorithm was run and produce the desired list of documents based on the user query.

Revision #5 | Updated 12/16/2019 8:28 a.m.