CrateDB

Viewing Revision #5 from 2019-12-11 00:07 View Current

CrateDB is an open-source, distributed, shared-nothing SQL database system developed by Crate.io, built on top of a variety of open source projects. Some of these projects include Facebook’s Presto SQL parser and Apache’s Lucene search library. It’s written in Java, so it can run on any operating system with a Java 7 VM. The system is intended to process IoT data, and therefore is designed for high scalability. CrateDB is typically used with machine-generated data, particularly for operational analytics applications. There is a free community edition under the Apache 2 license, and an enterprise edition which includes premium features and support options. Crate.io also owns other products: CrateDB Cloud, CrateDB Cloud on Azure, and Crate IoT Data Platform.[05][06][04][07][08]

Logo Versions

Website: https://cratedb.com/[01]
Source Code: https://github.com/crate/crate[02] Accessed: Jul 11, 2026 Last Commit: Jul 10, 2026
Tech Docs: https://cratedb.com/docs[03]
Developer: Crate GmbH
Country of Origin: AT
Start Year: 2014
Project Types: Commercial, Open Source
Written in: Java
Derived From: Elasticsearch, PrestoDB
Compatible With: PostgreSQL
Operating System: All OS with Java VM
License: Apache v2
Wikipedia: https://en.wikipedia.org/wiki/CrateDB[04]

Logo Versions

Website: https://cratedb.com/[01]
Source Code: https://github.com/crate/crate[02] Accessed: Jul 11, 2026 Last Commit: Jul 10, 2026
Tech Docs: https://cratedb.com/docs[03]
Developer: Crate GmbH
Country of Origin: AT
Start Year: 2014
Project Types: Commercial, Open Source
Written in: Java
Derived From: Elasticsearch, PrestoDB
Compatible With: PostgreSQL
Operating System: All OS with Java VM
License: Apache v2
Wikipedia: https://en.wikipedia.org/wiki/CrateDB[04]

CrateDB

Viewing Revision #5 from 2019-12-11 00:07 View Current

History[04][07]

CradeDB started as a standalone project by Jodok Batlogg (who previously contributed to Open Source Initiative Vorarlberg), Christian Lutz, and Bern Dorn. The group, who ran a consulting business before that helped companies use tools for their data needs, turned that knowledge into a product. The team won Judge’s Choice at GigaOm Structure Launchpad competition in June 2014 and TechCrunch Discord Europe in October 2014. The first version was released in September 2016, and was reportedly downloaded millions of times. The second and enterprise versions released in May 2017.

Concurrency Control[09]

Optimistic Concurrency Control (OCC)

Optimistic concurrency control is implemented using rows’ sequence number (_seq_no) and primary term (_primary_term). Initially each row’s sequence number is 0, and is incremented with every INSERT/UPDATE/DELETE to its shard (partition of the table). The primary term is incremented when a shard becomes primary. When updating or deleting, the query must be done with the correct sequence number and primary term, otherwise no effect will take place. CrateDB does not support transactions.

Foreign Keys[10]

Not Supported

CrateDB does not support foreign keys.

Isolation Levels[11][12]

Read Uncommitted Read Committed Serializable Repeatable Read

CrateDB supports four isolation levels: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, and SERIALIZABLE. The isolation level is specified in the begin statement as a transaction_mode.

Joins[13]

Nested Loop Join Hash Join

CrateDB supports two join algorithms: nested loop joins and block hash joins. By default the system uses nested loop joins. Block hash joins can only be applied on inner joins where the condition meets the following criteria: it contains at least one equal operator and no or operators and every argument of an equal operator can only reference fields from one relation. The hash join algorithm can be enabled or disabled explicitly. The system supports cross joins, inner joins, and left/right/full outer joins, and its performance is limited when joining two or more tables (resulting in poor execution plans).

Logging[14][15][16][17]

Logical Logging

Each shard has its own write ahead log (called the translog), which gets flushed to the index storage of Lucene when the log is full. The translog stores records of the operations on nodes. Operations store metadata about the query that was executed (such as job_id and used_bytes). The job that corresponds to the given job id stores the actual query statement that was run. Additionally, CrateDB supports application logging (with Log4j) and JVM garbage collection logging. Apache’s Log4J provides logging functionalities for Java. Crate DB uses JVM garbage collection logging to maintain garbage collection times.

Query Compilation[18][19][20][21]

JIT Compilation

Each node of the system has an SQL Handler, whose main responsibility is to accept and parse incoming SQL statements to create an execution plan. Upon processing the given query, this handler creates the AST (abstract syntax tree) for the given query, which is used to create the execution plan. The plan is first created on a logical level as a tree of operators. The parser tries to optimize the plan by either optimizing the individual operators or pushing down operators. Then, the plan is converted to a physical level and executed in the cluster by going through the given operator trees.

Query Execution[22][23][21]

Materialized Model

Given an operation tree, CrateDB’s Job Execution Service will execute the given plan. Each operation will emit the entire (intermediate) result to other nodes, which will continue the query execution with the given result.

Query Interface[24][25][26]

SQL HTTP / REST Command-line / Shell

CrateDB can be queried with regular SQL. One way to query a table is through the CrateDB admin UI, in which the user can construct and view the results of a query. Another way is through CrateDB Shell (called Crash), which is their custom command-line shell. An HTTP endpoint is also provided for submitting queries; using HTTPie users can query using some endpoint, to which CrateDB will respond with JSON. There are also numerous third party client tools that will work with CrateDB. Since the system supports PostgreSQL wire protocol, most third party tools for PostgreSQL will also be compatible with CrateDB.

Storage Architecture[27]

Disk-oriented

CrateDB stores data on disk, which is retrieved and stored depending on the execution plan. The data is partitioned and stored across many nodes.

Storage Model[28][29]

Hybrid

CrateDB stores given data both in row and column store formats. Column store is enabled by default for primitive types (such as integers or booleans) and cannot be turned off. It is supported for a limited length of text data as well, but can be disabled. It is not supported for other types, specifically compound or geographic types. Rows of a table are stored as a semi-structured document that can be nested in objects, and operation on these documents are atomic.

Storage Organization[30]

Sorted Files

In CrateDB, the tables are sharded (partitioned) and divided amongst the nodes, and the shard is stored as a Lucene index, which is further broken down to stored as files under a directory of a node. Data is appended to files, never removed, making replication and recovery easier. When writing to a file, the primary node is looked up and the new data is added to the file. The operation is repeated on replicas.

Stored Procedures[10]

Not Supported

CrateDB does not support stored procedures.

System Architecture[31]

Shared-Nothing

CrateDB is a shared nothing distributed system. Every Crate Node has the same four components, making them all equal in terms of functionality. They can all process SQL statements, execute queries, interact with the cluster, and store data. Therefore, any node can receive, process, and execute queries. A Crate cluster is defined as two different nodes representing the same database instance but running on different hosts. The cluster state includes metadata including global settings, discovered nodes, schemas, and status/location of shards. A single node in the cluster is elected the “Metadata Primary”, and is the only node allowed to change the state at runtime. Discovering a node refers to finding, adding, and removing nodes. A node will ping potential hosts, and after receiving a response including the Metadata Primary, the new node can send a join request to that cluster specifically. All nodes in a Crate cluster can communicate with any other node in the cluster, via byte serialized POJOs (plain old Java objects). This full mesh topology improves reliability and messages to be sent along the shortest possible path, but is limited in growth.

Views[32][33]

Virtual Views

CrateDB supports creating, querying, and dropping views. The view is not materialized, so the query associated with this view is rerun every time the view is used. The enterprise version allows different users to have privileges, so to query a view the user must have DQL privileges on the view. The user who created the view automatically has DQL privileges on all the relations in the view.

Citations

33 sources

CrateDB — Real-Time Analytics Database | SQL at Any Scale cratedb.com Accessed: 2026-06-02
GitHub - crate/crate: CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is PostgreSQL-compatible, and based on Lucene. · GitHub github.com Accessed: 2026-06-03
CrateDB: Guide cratedb.com Accessed: 2026-06-02
CrateDB - Wikipedia wikipedia.org Modified: 2026-04-04 Accessed: 2026-06-04
CrateDB | Distributed SQL database for real-time analytics at scale cratedb.com Accessed: 2026-06-02
CrateDB System Properties db-engines.com Accessed: 2026-06-07
Crate Lets Developers Set Up Big Data Backends In Minutes | TechCrunch techcrunch.com Accessed: 2026-06-02
https://thenewstack.io/designed-cratedb-realtime-sql-dbms-internet-things/ thenewstack.io Modified: 2026-06-05 Accessed: 2026-06-07
Optimistic Concurrency Control - CrateDB: Reference cratedb.com Modified: 2026-05-28 Accessed: 2026-06-02
SQL compatibility - CrateDB: Reference cratedb.com Modified: 2026-05-28 Accessed: 2026-06-02
BEGIN - CrateDB: Reference cratedb.com Accessed: 2026-06-07
Chapter 37 – SQL Transaction Concurrency - SQL 99 readthedocs.io Modified: 2026-04-05 Accessed: 2026-06-07
Joins - CrateDB: Reference cratedb.com Modified: 2026-05-28 Accessed: 2026-06-02
Storage and consistency - CrateDB: Reference cratedb.com Modified: 2026-05-28 Accessed: 2026-06-02
Logging - CrateDB: Reference cratedb.com Modified: 2026-05-28 Accessed: 2026-06-02
Apache Log4j :: Apache Log4j apache.org Modified: 2026-03-28 Accessed: 2026-06-07
System information - CrateDB: Reference cratedb.com Accessed: 2026-06-07
Clustering - CrateDB: Reference cratedb.com Accessed: 2026-06-07
crate/sql/src/main/java/io/crate/planner/operators/LogicalPlan.java at 98e5fe3d911c8ffdf605c7259f738b24ef1c4085 · crate/crate · GitHub github.com Accessed: 2026-05-30
crate/sql/src/main/java/io/crate/planner/operators/ExecutionPlanSymbolMapper.java at 98e5fe3d911c8ffdf605c7259f738b24ef1c4085 · crate/crate · GitHub github.com Accessed: 2026-05-30
crate/devs/docs/architecture.rst at master · crate/crate · GitHub github.com Accessed: 2026-05-30
Clustering - CrateDB: Reference cratedb.com Accessed: 2026-06-07
Joins - CrateDB: Reference cratedb.com Accessed: 2026-06-07
Installation - CrateDB: Guide cratedb.com Accessed: 2026-06-07
PostgreSQL wire protocol - CrateDB: Reference cratedb.com Accessed: 2026-06-07
HTTP endpoint - CrateDB: Reference cratedb.com Accessed: 2026-06-07
Clustering - CrateDB: Reference cratedb.com Accessed: 2026-06-07
Storage - CrateDB: Reference cratedb.com Accessed: 2026-06-07
Storage and consistency - CrateDB: Reference cratedb.com Modified: 2026-05-28 Accessed: 2026-06-02
https://crate.readthedocs.io/en/stable/architecture/storage_consistency.html readthedocs.io Dead — Check Archive Accessed: 2026-05-31
https://crate.readthedocs.io/en/stable/architecture/shared_nothing.html readthedocs.io Dead — Check Archive Accessed: 2026-05-31
Views - CrateDB: Reference cratedb.com Modified: 2026-05-28 Accessed: 2026-06-02
CREATE VIEW - CrateDB: Reference cratedb.com Modified: 2026-05-28 Accessed: 2026-06-02

Revision #5 Last Updated: 2019-12-10 19:07