ArangoDB

ArangoDB is a multi-model mostly-memory database. It supports key-value, documents, and graphs stores with JSON data format. ArangoDB stores all data in persistent storage to provide durability. However, to efficiently use ArangoDB, the frequently accessed pages, or equivalently called the working set, should be able to fit into the main memory. At the same time, unlike most NoSQL databases, ArangoDB supports join operation and allows users to specify either multi-collections transactions for ACID properties or standard single-document transactions for performance boosting.

History

The motivation of ArangoDB is to combine the most common usages of NoSQL databases. Other NoSQL databases also using JSON data format like MongoDB for documents and Neo4j for graphs naturally only support a single data model. ArangoDB tries to combine their use cases together to build an "all-in-one" database so that users do not need to use a second database for different types of data. ArangoDB was ready to be used in production since version 1.0 released in spring 2012. It is originally named AvocadoDB. Later to avoid legal issues, the name is changed to ArangoDB in May 2012.

Checkpoints

Non-Blocking

There are two types of synchronization in the MMFiles storage of ArangoDB namely eventual and immediate synchronization. The default eventual synchronization will return success to the user when an operation is finished, but the changes do not necessarily reflect in the disk. There is a background thread periodically flushing them to disk. On the contrary, immediate synchronization lets everything flush to disk before returning success to the user. The other storage option RocksDB also provides a non-blocking checkpoint.

Compression

Naïve (Record-Level)

ArangoDB supports record compression by only storing JSON attributes values but not names. All JSON formats are stored as metadata of a collection and each format corresponds to a simple ID. The record only needs to store the ID. Besides it, the RocksDB also uses Snappy (invented by Google) for fast data compression.

Concurrency Control

Two-Phase Locking (Deadlock Detection)

The user needs to specify which collections a transaction needs to read/write. ArangoDB will first collect all the locks in lexicographical order of the collection names at the beginning of each transaction, and release the locks in reverse order after the transaction finishes. In case there is a deadlock, ArangoDB will automatically abort one of the transactions, roll back the changes, and throw an error to the client.

Data Model

Key/Value Document / XML Graph

In ArangoDB, a document collection always has a primary key. Therefore, without specifying any secondary index, it is just like a key-value store. Generally, there can be multiple attributes and multiple secondary indexes, then it is like a common document store. By default, the sharding key is the same as the primary key. This is to help partition similar data to the same shard so that it can efficiently process queries and achieve better linear scalability. Besides key-value store and document store, ArangoDB also supports graph store. It supports operations including traversal (e.g. breadth-first search, depth-first search), shortest path, etc.

Foreign Keys

Not Supported

Since ArangoDB is a NoSQL database, it does not have the concept of foreign key. However, if foreign key is really needed as a feature, users can make use of graphs store to implement their own ones to simulate foreign key.

Indexes

Skip List Hash Table Inverted Index (Full Text)

By default, the index of a key-value or a document store is a hash index on its primary key. At the same time, users can specify other indexes including skip list, full-text index, persistent index, geo-spatial Index etc. The graph store adopts a different strategy. It uses a hybrid index combining hash index and doubly linked list to deal with graph operations more efficiently.

Isolation Levels

Serializable

The default MMFiles engine supports serializable isolation. In each transaction, users need to specify collections they need to access in advance, and all these collections will be locked at the beginning of a transaction to prevent from others modifying at the same time. Within these collections, it is guaranteed that there is no uncommitted changes, unrepeatable reads, and phantom problem. Another storage option RocksDB engine only disallowed write-write conflict. Therefore when two transactions read and write the same set of collections at the same time, it is possible to read uncommitted changes.

Joins

Nested Loop Join

ArangoDB's query language AQL provide for-loop syntax to achieve similar SQL joins. The process is like a nested loop join, and if the join attribute is an index in the inner loop, then it is similar to an index nested loop join.

Logging

Logical Logging

ArangoDB does not overwrite existing documents. Instead, it creates a new version of modified documents for all the write operations (including delete operation). ArangoDB's Write-Ahead-Logging records all these write operations that were executed on the server. The WAL can be used for recovery and setting up a new replica by replaying the log.

Query Execution

Materialized Model

Each query first goes to a query optimizer, which generates one or more possible plans according to the current data model and estimates the cost of each plan. Only the one with the lowest cost is returned. The output plan will be executed in a pipeline manner on execution nodes. Each node receives a job from its parent, divides and distributes it to children nodes. All results from children are then aggregated and returned to the current node's parent.

Query Interface

Custom API HTTP / REST Command-line / Shell

ArangoDB has its own query language ArangoDB query language (AQL). AQL queries can be invoked using Arangosh (ArangoDB Shell), web interface, or HTTP REST API.

Storage Architecture

In-Memory

ArangoDB is a mostly-memory database, which means it needs the working set to fit into the main memory to perform well. The whole dataset is stored on disk to avoid data loss. There are two storage engines available. The default one is called MMFiles which is based on memory-mapped files. The other available option is RocksDB.

Storage Model

Custom

ArangoDB stores JSON data in journal files with append operation only. When a journal file is full, it is marked data file and become immutable. Each journal file has a fixed size (by default 32MB), and a collection can be stored in multiple journal file. Along with each collection, there is also a "shape" file mapping each JSON format to a shape id. In the journal file, JSON records are stored sequentially with only the attribute values but not names. The records are stored in binary format along with a shape id used to deserialize it later.

Storage Organization

Log-structured

The RocksDB storage engine uses a log-structured organization. MMFiles storage engine does not specify a storage organization since it mainly focuses on data that can totally fit into memory.

Stored Procedures

Supported

ArangoDB allows users to define their own User Defined Functions (UDFs). Users can also write Javascript applications and integrate them with the database as a microservice under the Foxx framework. Such applications can access data from the database inside and therefore work efficiently. This can be used to achieve similar functionalities of stored procedures.

System Architecture

Shared-Nothing

In a cluster of nodes, each ArangoDB instance has its own data copy that can fully function well independent of other node failures. The system adopts a master/master model, which means every same type of node can serve the same type of requests. Under network partition, it prefers consistency over availability, which is a "CP" model. There are four roles in a cluster: agents, coordinators, primary DBservers, and secondaries. Agents are in charge of manage the cluster. It uses Raft for consensus. Coordinators are in charge of receiving and responding to client requests. Primary DBservers are the main hosts of data. They use synchronized updates. The secondaries will then do the same update asynchronously. Given specific application scenarios, users can also specify other replication models including master/slave and active failover.

Views

Materialized Views

Generally, there is no view in ArangoDB. However, there is a search engine specifically designed for document searching called ArangoSearch that adopts a similar "materialized view" idea as in SQL databases. By utilizing pre-processed document information, it can reduce the complexity of execution plans and allows fuzzy search.

ArangoDB Logo
Website

https://www.arangodb.com/

Source Code

https://github.com/arangodb/arangodb

Former Name

AvocadoDB

Developer

ArangoDB GmbH

Country of Origin

DE

Start Year

2011

Project Type

Commercial, Open Source

Written in

C++

Supported languages

C#, Java, JavaScript, PHP, Python, Ruby

Operating Systems

Linux, OS X, Solaris, Windows

Licenses

Apache v2

Wikipedia

https://en.wikipedia.org/wiki/ArangoDB