In 2012, the founder of GrapheneDB, Alberto Perdomo, wanted to use Neo4j for a project. The process of setting up and monitoring servers and finding a place to host it was challenging. Therefore, he decided to create GrapheneDB to make it easier for clients to focus on learning about Cypher and graph modeling in Neo4j as well as developing their applications.
GrapheneDB is a Database-as-a-Service provider that uses Neo4j. Neo4j uses non-block checkpoints, so it can be backed up as it serves user traffic. Neo4j can have daily or weekly full backups, which results in a database image on the disk. It also provides incremental backups that can be done hourly or daily. A combination of incremental and full backups allows for safety and efficiency. GrapheneDB also provides similar backups- daily, weekly, monthly, or on-demand backups. This also captures a current snapshot of the database for recovery (if needed) and doesn't require downtime.
GrapheneDB uses "bit shaving" to compress the number of bits needed to store primitive types in arrays. This means that if an int array of size 4 has a a largest value of 4, it will only require 3 bits to write that 4. Therefore, each element in the array will be written in 3 bits to separate them from each other. The values are still "ints", but can be stored more efficiently. Similarly, if an int array contains -1, then that value will use the 32 bits required to write it, so each element in the array must also be written in 32 bits. It also has classes to limit the number of characters in each class. For example, the "Numerical, Date, and Hex" class have a 54 character count limit. This helps determine whether a string can be inlined or not, which allows for compression and less disk space required.
Two-Phase Locking (Deadlock Detection)
Since GrapheneDB uses Neo4j as its graph database, it has the same concurrency control choices. Neo4j does deadlock detection and the transaction causing the deadlock is rolled back so the other transactions can continue. Neo4j allows the user to retry the transaction either by asking for the transaction to be attempted a certain number of times or through a retry loop.
Graphene is a Database-as-a-Service that uses Neo4j as the underlying graph database. Graph databases are good for data that are highly related to all the other data points as graph databases store data as nodes and the relationships between the nodes. Accessing nodes and getting relationships in a graph database is a constant time operation, which makes querying fast.
In relational databases, a foreign key is a key that "joins" two tables in a JOIN. In graph databases, because relationships are just as important as the actual data, relationships and adjacent nodes are stored in the data itself, so foreign keys are not necessary in a graph database. The graphs uses the adjacent nodes and connections to access other data.
B+Tree Inverted Index (Full Text)
GrapheneDB uses B+ trees as its native index. The key size of an index is limited to 4036 bytes, however, so if a transaction reaches the maximum key size, it will fail. When the B+ tree index is full, a user can change the configuration to use Lucene for that particular index- the user would have to delete that index and recreate it. It can also support full text search by keeping all the data up to date automatically whenever new data or indexes are created. These index choices come from Neo4j.
Cypher is Neo4j's graph query language that lets user's store and get data from the graph database easily. GrapheneDB has also followed suit and uses Cypher. Cypher has similar functionalities as SQL since it was inspired by the SQL language. Cypher allows queries to be written so that it can select, insert, update, delete data without a description of how to do it.
GrapheneDB uses Neo4j, which uses a custom storage model. Because Neo4j does not have a schema, each store file has the nodes, relationships, and key value properties. These are all stored at particular offsets in the files. For example, the property record holds 32 bytes (4 8byte blocks) which can either be a key, value, or both. Data that is on disk is a linked list size records that are fixed. Property records are also stored as a linked list. Each node also references the first relationship it has.