OrientDB

View Current Viewing Revision #12 from 12/09/2018 10:09 p.m.

OrientDB is a multi-model NoSQL DBMS that supports of graph, documents, key-value, and object-oriented storage. Instead of just implementing another layer with an API, OrientDB integrates those models. It also supports both disk-oriented and in-memory storages. Moreover, OrientDB supports SQL syntax with few differences from standard SQL and extends SQL to support complex graph concepts. It is also an ACID compliant DBMS and able to handle transactional workloads. OrientDB supports a multi-master distributed architecture.

History

OrientDB was originally developed by Luca Garulli in 2010. Luca rewrote the fast persistent layer of OrientDB ODBMS in Java as OrientDB. Starting from 2012, OrientDB is sponsored by OrientDB LTD, whose founder and CEO is Luca. OrientDB LTD is a for-profit company, whose former is called Orient Technologies LTD. Andrey Lomakin redeveloped the storage engine of OrientDB, called plocal, from 2012 to 2014. In 2013, Andrey joined the company as the co-owner and the leader of R&D department of OrientDB LTD. On Sep. 19 2017, Callidus Software Inc. (NASDAQ:CALD), doing business as CallidusCloud acquired OrientDB LTD.

Storage Model

N-ary Storage Model (Row/Record)

OrientDB uses page as a basic unit to store records. It is essentially N-ary storage model. Records are usually stored in two kinds of pages. The first kind of pages store metadata about records including RIDs and pointers to the actual content. Each entry has fixed size. The other kind of pages store actual content of records. Each record is store as key/value pairs.

Logging

Physical Logging

OrientDB applies Write Ahead Logging (WAL). It performs physical logging by recording changes done in pages. For each page change, OrientDB records offset and length of bytes changed with before and after values.

Storage Organization

Heaps

The pages are unordered and the size of a page is 64KB. Actual content of records are stored in pages. If a record cannot fit into a page, it will be stored in multiple pages.

Data Model

Key/Value Document / XML Graph Object-Oriented

OrientDB is a multi-model DBMS. It supports graph, document, key-value and object-oriented models. It combines all the features of the four models into the core engine rather than just implement an additional layer of APIs to support various models. The graph model represents a network structures including vertices representing entities and edges showing connections among vertices. Besides mandatory properties to define vertices and edges, OrientDB allows user-defined properties for both vertices and edges, which make them like documents. For document model, OrientDB introduces the concept "LINK" as the relationship among documents. Hence, when users refer a document, all "LINK"s will be automatically resolved by OrientDB instead of done by developers in common document DBMS. The key-value model is simplest among all four models. OrientDB organizes key-value pairs similar to common key-value models. The difference is that OrientDB supports richer types of values: it allows graph elements and documents as values. The object-oriented model is derived from the concept of object-oriented programming. OrientDB directly uses concepts in object-oriented programming to define records. It supports inheritance and polymorphism.

Query Execution

Tuple-at-a-Time Model Vectorized Model

OrientDB is originally designed to use iterator model. However, OrientDB allows some fetching strategies to use vectorized model. Some components in execution plans pre-fetch records in a single call and then do batch processing. This pattern can be considered as vectorized model.

System Architecture

Shared-Nothing

OrientDB supports multi-master shared-nothing distributed architecture. OrientDB uses the Hazelcast Open Source project in its distributed architecture. It integrates Hazelcast to maintain the lifecycle of every nodes in the distributed system. OrientDB also uses Hazelcast plugin for distributed configuration.

Isolation Levels

Read Committed Repeatable Read

OrientDB supports two isolation levels: Read Committed and Repeatable Reads. The default isolation level is Read Committed. Read Committed is the only available isolation level when transactions are performed on remote databases. Repeatable Reads is allowed only when transactions are perform on local databases and consumes more memory than Read Committed. Users can change the isolation level using Java API.

Stored Procedures

Supported

OrientDB introduces the concept Functions similar to Stored Procedure of RDBMS. Functions can be written in SQL and JavaScript but it can be executed via SQL, Java and REST.

Compression

Naïve (Record-Level)

OrientDB supports record-level compression. The compression includes two types of algorithms: gzip and snappy. The default is no compression. Users can set compression choices using SQL syntax or in the configuration of storage engine. Users can also define custom compression algorithms. The records will be decompressed when they are loaded from the storage engine.

Storage Architecture

Hybrid

OrientDB supports in-memory and disk-oriented databases. It has corresponding abstractions to memory and disk storage in order to support both storage architectures. OrientDB also supports larger-than-memory databases. JVM will allocate more space from swap.

Views

Materialized Views

OrientDB supports materialized views in the latest version. It uses the SQL syntax to create or drop views. Materialized views can be configured to read-only or updatable. The default is read-only. Users can define update interval to update views every certain period. Also, users can manually modify views and the modification will be reflected in corresponding records. Updatable views cannot be created from aggregation.

Joins

Not Supported

OrientDB does not support join syntax. It introduces the concept LINKS to represent relationships. It can traverse LINKS in order to achieve the same goal as join.

Checkpoints

Consistent

OrientDB supports full checkpointing. It is a simple disk cache flush, which means it flushes all the content in disk cache to the disk. It can be invoked when cluster is added to storage, cluster changes or the storage closes. Users can set time stamps to perform full checkpointing in those scenarios during the configuration of storage engine.

Indexes

B+Tree Hash Table Inverted Index (Full Text)

OrientDB supports five index algorithms, which belong to three categories. Moreover, OrientDB allows users to define custom index engines by asking them to implement specific classes. SB-Tree index The SB-tree index is a variant of B-tree index with optimizations focusing on data insertion and long range queries. It is the default index type of OrientDB. Hash index OrientDB supports two hash index algorithms, regular hash index and auto sharding index, an implementation of distributed hash table based on Murmur3 hash function. Both index applies extendible hashing algorithm and do not support range queries. Lucene engine Apache Lucene Core is an implementation of inverted index. OrientDB provides full-text and spatial index using Lucene engine. OrientDB uses SQL syntax to manage indexes using a specific prefix representing indexes. OrientDB has two methods to update indexes, automatic and manual. The default is manual. When creating the index, users should specify the type of indexing and the relevant classes. If users would like to use automatic method for updating indexes, they also need to explicitly specify that when creating indexes.

Concurrency Control

Multi-version Concurrency Control (MVCC)

OrientDB applies Multi-version Concurrency Control and checks the integrity on commit. It is optimistic and OrientDB does not support pessimistic transactions. When a transaction has a conflict with another, OrientDB will throw an exception and the application can determine whether to abort it or not. With Graph, OrientDB provides three consistency mode. The first mode, which is default, will maintain consistency using transactions while the other two does not use transactions. They replies on a database repair operation. One runs the repair operation synchronously to the application, but the other runs the repair operation asynchronously to the application.

Query Compilation

JIT Compilation

The query execution planner in OrientDB generates execution plan consisting of pre-defined query steps, which are components written in Java. Thus, OrientDB uses common JVM JIT compilation. Besides, query execution plans are cached to avoid recalculating execution plans for the same query.

Query Interface

SQL Stored Procedures GraphQL Gremlin HTTP / REST

OrientDB uses SQL as its query language and has some extensions to support graph functionality. However, the syntax has some differences from the standard SQL syntax. For example, it does not support joins or HAVING keyword. OrientDB also has its own concept similar to stored procedures of RDBMS.

Revision #12 | Updated 12/09/2018 10:09 p.m.