OrientDB

OrientDB is a multi-model NoSQL DBMS that supports of graph, documents, key-value, and object-oriented storage. Instead of just implementing another layer with an API, OrientDB integrates those models into the core engine. Moreover, OrientDB supports SQL syntax with few differences from standard SQL and extends SQL to support complex graph concepts. It is also an ACID compliant DBMS and able to handle transactional workloads. OrientDB supports a multi-master distributed architecture.

History

OrientDB was originally developed by Luca Garulli in 2010. Luca rewrote the fast persistent layer of OrientDB ODBMS in Java as OrientDB. Starting from 2012, OrientDB is sponsored by OrientDB LTD, whose founder and CEO is Luca. OrientDB LTD is a for-profit company, whose former is called Orient Technologies LTD. Andrey Lomakin redeveloped the storage engine of OrientDB, called plocal, from 2012 to 2014. In 2013, Andrey joined the company as the co-owner and the leader of R&D department of OrientDB LTD.

Isolation Levels

Read Committed Repeatable Read

OrientDB supports two isolation levels: Read Committed and Repeatable Reads. The default isolation level is Read Committed. Read Committed is the only available isolation level with remote protocol. Repeatable Reads is only allowed with plocal and memory protocol and consumes more memory than Read Committed. Users can change the default isolation level using Java API.

Joins

Not Supported

OrientDB does not support join syntax. It introduces the concept LINKS to represent relationships. It uses dot operation (.) to traverse LINKS in order to achieve the same goal as join.

Logging

Physical Logging

OrientDB applies Write Ahead Logging (WAL). It performs physical logging by recording changes done in pages. For each page change, OrientDB records offset and length of bytes changed with before and after values.

Compression

Naïve (Record-Level)

OrientDB supports record-level compression. The compression includes two types of algorithms: gzip and snappy. The default is no compression. Users can set compression choices using SQL syntax ALTER CLUSTER or in the configuration of storage engine. Users can also define custom compression algorithms by registering them in the class OCompressionFactory(). The records will be decompressed when they are loaded from the storage engine.

Indexes

B+Tree Hash Table Inverted Index (Full Text)

OrientDB supports five index algorithms, which belong to three categories. Moreover, OrientDB allows users to define custom index engines by asking them to implement OIndexFactory and OIndexEngine classes.

  1. SB-Tree index The SB-tree index is a variant of B-tree index with optimizations focusing on data insertion and long range queries. It is the default index type of OrientDB.

  2. Hash index OrientDB supports two hash index algorithms, regular hash index and auto sharding index, an implementation of distributed hash table based on Murmur3 hash function. Both index applies extendible hashing algorithm and do not support range queries.

  3. Lucene engine Apache Lucene Core is an implementation of inverted index. OrientedDB provides full-text and spatial index using Lucene engine. OrientDB can handle indexes in the same way as handling basic type Class. It uses SQL syntax to manage indexes using the format index:name, where name is the index name. OrientDB has two methods to update indexes, automatic and manual. The default is manual. When creating the index, users should specify the type of indexing and the relevant classes. If users would like to use automatic method for updating indexes, they also need to explicitly specify that when creating indexes.

Storage Model

N-ary Storage Model (Row/Record)

OrientDB uses page as a basic unit to store records. It is essentially N-ary storage model. Clusters are usually stored in two kinds of pages. The first kind of pages store metadata about records including RIDs and pointers to the actual content. Each entry has fixed size. The other kind of pages store actual content of records. Each record is store as key/value pairs.

Query Compilation

JIT Compilation

The query execution planner in OrientDB generates execution plan consisting of pre-defined query steps, which are components written in Java. Thus, OrientDB uses common JVM JIT compilation. Besides, query execution plans are cached to avoid recalculating execution plans for the same query.

Concurrency Control

Multi-version Concurrency Control (MVCC)

OrientDB applies Multi-version Concurrency Control and checks the integrity on commit. It is optimistic and OrientDB does not support pessimistic transactions. When a transaction has a conflict with another, OrientDB will throw an OConcurrentModificationException and the application can determine whether to abort it or not. With Graph, OrientDB provides various consistency mode: tx, notx_sync_repair and notx_async_repair. tx mode, the default mode will maintain consistency using transactions while the other two does not use transactions. They replies on a database repair operation. notx_sync_repair mode will run the repair synchronously while notx_async_repair will run the repair asynchronously to the application.

Checkpoints

Consistent

OrientDB supports full check point. It is a simple disk flush. It can be invoked when cluster is added to storage, cluster changes or the storage closes. Users can set time stamps to perform full check pointing in those scenarios during the configuration of storage engine. OrientDB plans to support fuzzy check pointing in the future and it is under implementation now.

Storage Architecture

Hybrid

OrientDB supports in-memory and disk-oriented databases. It has two kinds of abstraction of storages: Paginated Local Storage(plocal) and Memory Storage. Paths in storages are prefixed by plocal: and memory:. OrientDB also supports larger-than-memory databases. JVM will allocate more space from swap and the JVM option -XX:MaxDirectMemorySize can be set to limit total size of memory.

Stored Procedures

Supported

OrientDB introduces the concept Functions similar to Stored Procedure of RDBMS. Functions can be written in SQL and JavaScript but it can be executed via SQL, Java, REST and Studio.

Query Execution

Tuple-at-a-Time Model Vectorized Model

OrientDB is originally designed to use iterator model in a lazy manner. However, OrientDB allows some fetching strategies to use vectorized model. Some components in execution plans pre-fetch records in a single call and then do batch processing. This pattern can be considered as vectorized model.

Views

Materialized Views

OrientDB supports materialized views in the latest version. It uses the SQL syntax to create or drop views. OrientDB handles materialized views as the basic type Class. Materialized views can be configured to read-only or updatable. The default is read-only. Users can define update interval to update views every certain period. Also, users can manually modify views and the modification will be reflected in corresponding records. Updatable views cannot be created from aggregation.

System Architecture

Shared-Nothing

OrientDB supports multi-master shared-nothing distributed architecture. OrientDB uses the Hazelcast Open Source project for discovering nodes automatically, storing runtime configuration and synchronization.

Data Model

Key/Value Document / XML Graph Object-Oriented

OrientDB is a multi-model DBMS. It supports graph, document, key-value and object-oriented models. It combines all the features of the four models into the core engine rather than just implement an additional layer of APIs to support various models.

  1. The graph model represents a network structures including vertices representing entities and edges showing connections among vertices. Besides mandatory properties to define vertices and edges, OrientDB allows user-defined properties for both vertices and edges, which make them like documents.

  2. For document model, the main difference between OrientDB and common document DBMS, e.g. MongoDB, is that OrientDB introduces the concept "LINK" as the relationship among documents. Hence, when users refer a document, all "LINK"s will be automatically resolved by OrientDB instead of done by developers in common document DBMS.

  3. The key-value model is simplest among all four models. OrientDB uses Class or Clusters to organize key-value pairs similar to Buckets in common key-value models. The difference is that OrientDB supports richer types of values: it allows graph elements and documents as values.

  4. The object-oriented model is derived from the concept of object-oriented programming. OrientDB uses the concept Class to define records, close to Table in relational model. It supports inheritance and polymorphism of Class.

Storage Organization

Heaps

The pages are unordered and the size of a page is 64KB. Actual content of records are stored in pages. If a record cannot fit into a page, it will be stored in multiple pages.