JanusGraph

View Current Viewing Revision #34 from 06/29/2019 3:19 p.m.

JanusGraph is a transactional graph DBMS optimized for storing and querying graphs distributed across a multi-machine cluster.

History

2012 Originally called TitanDB and was released and developed by a company called Aurelius.
Early 2015 Acquired by DataStax, the company behind the Apache Cassandra database. But the development of TitanDB project was stagnated since the acquisition and no further release of any newer version since the release of the 1.0 version in Sep 2015.
Late 2015 The open source community behind TitanDB took the project to The Linux Foundation, renamed it JanusGraph and kept on developing it.
Late 2018 The project is still under development and just released its latest update.

Concurrency Control

Not Supported

As described in the section for checkpoint, JanusGraph has two different layers that managing transaction, the first one is handled in the JanusGraph Server, which is in the middleware layer, from the page that describes this transaction managing layer, it seems that this layer barely supports any kind of concurrency control, just a wrapper of a set of operations so they can be applied to the backend storage together.

But JanusGraph depends on other backend storage to store data and execute queries, so how does concurrency control is handled in this storage layer are largely dependents on the choice of its backend storage system. In most of the case, JanusGraph assumes that the backend storage system ensures serializability or at least something close to serializability, but a page in advanced topic shows that JanusGraph can also work with eventual consistency.

Storage Model

Custom

JanusGraph stores graphs in adjacency list format which means that a graph is stored as a collection of vertices with their adjacency list. The adjacency list of a vertex contains all of the vertex’s incident edges and all the properties. JanusGraph stores the adjacency list representation of a graph in any storage backend that supports the Bigtable data model.

Data layout to store graphs in JanusGraph can be directly mapped to big table model easily as shown above, but according to the page, there's an additional requirement for the big table model supported in the storage backend, which is that the cells must be sorted by their columns and a subset of the cells specified by a column range must be efficiently retrievable.

Logging

Command Logging

The logging mechanism on JanusGraph is more for recording changes, downstream updates, and triggers, rather than for data recovery in most of the DBMS system. In JanusGraph users can define their own logging scheme by implementing a Transaction Log Processor, and enabled by code like
tx = graph.buildTransaction().logIdentifier('logging name').start()

There's a different system in JanusGraph that allows users to enable write-ahead-logging for recovery by
tx.log-tx = true
and users also need to setup process to read the log and prepare for recovery by running
recovery = JanusGraphFactory.startTransactionRecovery(graph, startTime, TimeUnit.MILLISECONDS);
and no evidence shows that it is a standard mechanism to be enabled in a normal running of JanusGraph.

Query Interface

Gremlin

JanusGraph uses Gremlin graph query language to retrieve data from and modify data in the graph. Gremlin is a functional language whereby traversal operators are chained together to form path-like expressions, expressing query or data modification on graphs.

A Gremlin query is a chain of operations/functions that are evaluated from left to right. A simple example of gremlin language to query the name of Alex's grandson on his genealogy graph is provided below

g.V().has('name', 'Alex').out('father').out('father').values('name')

The query can be read as
g : for the current graph
V : for the vertices in the graph
has('name', 'Alex') : filters the vertices down to those with the "name" property "Alex"
out('father') : traverse outgoing through edges whose type is "father" from "Alex" (Notice here the result can be more than one vertices)
out('father') : traverse outgoing through edges whose type is "father" from "Alex"' son (the result from the last traversal)
values('name') : get the property of "name"

Other more complex examples of Gremlin can be found in Complete Gremlin Manual

There're two ways to query JanusGraph with Gremlin

First one is to use Gremlin Console, Gremlin Console is an interactive shell that is distributed with JanusGraph, by connecting Gremlin Console to JanusGraph Server, users can directly interact with graphs stored in JanusGraph Server using Gremlin.
The second one is to embed Gremlin code into your Java Application
Applications can interact with JanusGraph in two different ways using Gremlin query language.
- Embed JanusGraph into your application, which I believed can only be possible for Java Application, where you can pack JanusGraph into your application, and they will be run in the same JVM instance.
- Using JanusGraph's Gremlin Server, for which you can send your gremlin script to a server and then run them separately without packed JanusGraph into your application. I believe it is the recommended method as it is more portable and scalable.

Data Model

Graph

There are three building blocks of JanusGraph's data model, which are edge, vertex, and property.

Each edge is an entity that connects two vertices and has a label which defines the semantics of the relationship. For instance, an edge labeled friend between vertices A and B encodes a friendship between the two individuals. Most properties can be associated with edges, and some properties such as multiplexity can be associated with edge labels.
Vertex is an entity that can be associated with optional labels and properties.
Property on vertices or edges is a key-value pair. For instance, the property name='Daniel' has the key 'name' and the value 'Daniel'. Property keys are part of the JanusGraph schema and can constrain the allowed data types and cardinality of values.

Moreover, each JanusGraph graph has a schema comprised of the edge labels, property keys, and vertex labels used therein. A JanusGraph schema can either be explicitly or implicitly defined. The schema type (i.e., edge label, property key, or vertex label) is assigned to elements in the graph (i.e., edge, properties or vertices respectively) when they are first created. The assigned schema type cannot be changed for a particular element, which ensures a stable type system that is easy to reason about.

System Architecture

Shared-Disk

JanusGraph itself does not only focus on graph serialization, graph data modeling, and efficient query execution on graphs, which means that it does not handle storage, indexing and data analysis itself. But JanusGraph implements a set of interfaces for data storage, data indexing, and client access.

There are one or more storage and optional indexing backend that JanusGraph connected to with adapters, and managing data storage and indexing for JanusGraph.

There are several standard adapters that come along with JanusGraph: * Data storage: Apache Cassandra, Apache HBase, Oracle Berkeley DB Java Edition
* Indexes: Elasticsearch, Apache Solr, Apache Lucene

Revision #34 | Updated 06/29/2019 3:19 p.m.