ArangoDB

View Current Viewing Revision #8 from 12/12/2018 6:19 p.m.

ArangoDB is a multi-model mostly-memory database. It supports key-value, documents, and graphs stores with JSON data format. ArangoDB stores all data in persistent storage to provide durability. However, to efficiently use ArangoDB, the frequently accessed pages, or equivalently called the working set, should be able to fit into the main memory. At the same time, unlike most NoSQL databases, ArangoDB supports join operation and allows users to specify either multi-collections transactions for ACID properties or standard single-document transactions for performance boosting.

History

The motivation of ArangoDB is to combine the most common usages of NoSQL databases. Other NoSQL databases also using JSON data format like MongoDB for documents and Neo4j for graphs naturally only support a single data model. ArangoDB tries to combine their use cases together to build a "all-in-one" database so that users do not need to use a second database for different types of data. ArangoDB was ready to be used in production since version 1.0 released in spring 2012. ArangoDB is original named AvocadoDB. Later to avoid legal issues, the name is change to ArangoDB in May, 2012.

Views

Materialized Views

Generally, there is no view in ArangoDB. However, there is a search engine specifically designed for document searching called ArangoSearch that adopts a similar "materialized view" idea as in SQL databases. By utilizing pre-processed document information, it can reduce the complexity of execution plans and allows fuzzy search.

Isolation Levels

Serializable

The default MMFiles engine supports serializable isolation. In each transaction, users need to specify collections they need to access in advance, and all these collections will be locked at the beginning of a transaction to prevent from others modifying at the same time. Within these collections, it is guaranteed that there is no uncommitted changes, unrepeatable reads, and phantom problem. Another storage option RocksDB engine only disallowed write-write conflict. Therefore when two transactions read and write the same set of collections at the same time, it is possible to read uncommitted changes.

Concurrency Control

Two-Phase Locking (Deadlock Detection)

The user need to specify which collections a transaction needs to read/write. ArangoDB will first collect all the locks in lexoigraphical order of the collection names at the beginning of each transaction, and release the locks in reverse order after the transaction finishes. In case there is a deadlock, ArangoDB will automatically abort one of the transactions, roll back the changes, and throw an error to the client.

Query Execution

Materialized Model

Each query first goes to a query optimizer, which generates one or more possible plans according to the current data model and estimate the cost of each. Only the one with lowest cost is returned. The output plan will be execute in a pipeline manner on execution nodes. Each node receives a job from its parent, divides and distributes it to children nodes. All results from children are then aggregated and returned to the current node's parent.

Logging

Logical Logging

ArangoDB does not overwrite existing documents. Instead, it create a new version of modified documents for all the write operations (including delete operation). ArangoDB's Write-Ahead-Logging records all these write operations that were executed on the server.

Stored Procedures

Supported

ArangoDB allows users to define their own User Defined Functions (UDFs). Users can also use the Foxx microservice framework to build their own logic into a microservice inside the database and able to access data it needs. This can achieve the same functionalities of stored procedures.

Storage Organization

Copy-on-Write / Shadow Paging

Indexes

Hash Table

By default, the index of a key-value or a document store is a hash index on its primary key. At the same time, user can specify other indexes including skip list, fulltext index, persistent index, geo-spatial Index etc. The graph store adopts a different strategy. It uses a hybrid index combining hash index and doubly linked list to deal with graph operations more efficiently.

Data Model

Key/Value Document / XML Graph

In ArangoDB, a document collection always has a primary key. Therefore, without specifying any secondary index, it is just like a key-value store. Generally, there can be multiple attributes and multiple secondary indexes, then it is like a common document store. By default, the sharding key is the same as the primary key. This is to help partition similar data to the same shard so that it can efficiently process queries and achieve better linear scalability. Besides key-value store and document store, ArangoDB also supports graph store. It supports operations including traversal (e.g. breadth-first search, depth-first search), shortest path, etc.

Query Interface

HTTP / REST

ArangoDB has its own query language ArangoDB query language (AQL). AQL queries can be invoked using Arangosh (ArangoDB Shell), web interface, or HTTP REST API.

Storage Architecture

In-Memory

ArangoDB is a mostly-memory database, which means it needs the working set to fit into the main memory to perform well. The whole dataset is stored on disk to avoid data loss. There are two storage engines available. The default one is called MMFiles which is based on memory-mapped files. The other available option is RocksDB.

Joins

Nested Loop Join

ArangoDB's query language AQL provide for-loop syntax to achieve similar SQL joins. The process is like a nested loop join, and if the join attribute is an index in the inner loop, then it is similar to a index nested loop join.

Revision #8 | Updated 12/12/2018 6:19 p.m.

View Current Viewing Revision #8 from 12/12/2018 6:19 p.m.

Website

https://www.arangodb.com/

Source Code

https://github.com/arangodb/arangodb

Developer

ArangoDB GmbH

Country of Origin

Start Year

2011

Former Name

AvocadoDB