CrateDB

Viewing Revision #10 from 2022-10-28 14:16 View Current

CrateDB is an open-source, distributed, shared-nothing SQL database system developed by Crate.io, built on top of a variety of open source projects. Some of these projects include Facebook’s Presto SQL parser and the Apache Lucene search library.[06][07][08][09]

Logo Versions

Website: https://cratedb.com/[01]
Source Code: https://github.com/crate/crate[02] Accessed: Jul 11, 2026 Last Commit: Jul 10, 2026
Tech Docs: https://cratedb.com/docs[03]
Developer: Crate GmbH
Country of Origin: AT
Start Year: 2014
Project Types: Commercial, Open Source
Written in: Java
Derived From: Elasticsearch, PrestoDB
Compatible With: PostgreSQL
Operating System: All OS with Java VM
License: Apache v2
Twitter: @cratedb[05]
Wikipedia: https://en.wikipedia.org/wiki/CrateDB[04]

The system's target workload is machine-generated IoT data, particularly for operational analytics applications.

Logo Versions

Website: https://cratedb.com/[01]
Source Code: https://github.com/crate/crate[02] Accessed: Jul 11, 2026 Last Commit: Jul 10, 2026
Tech Docs: https://cratedb.com/docs[03]
Developer: Crate GmbH
Country of Origin: AT
Start Year: 2014
Project Types: Commercial, Open Source
Written in: Java
Derived From: Elasticsearch, PrestoDB
Compatible With: PostgreSQL
Operating System: All OS with Java VM
License: Apache v2
Twitter: @cratedb[05]
Wikipedia: https://en.wikipedia.org/wiki/CrateDB[04]

CrateDB

Viewing Revision #10 from 2022-10-28 14:16 View Current

The system's target workload is machine-generated IoT data, particularly for operational analytics applications.[06][07][08][09]

History[09]

CradeDB started as a standalone project by Jodok Batlogg (who previously contributed to Open Source Initiative Vorarlberg), Christian Lutz, and Bern Dorn. The group, who ran a consulting business before that helped companies use tools for their data needs, turned that knowledge into a product. The team won Judge’s Choice at GigaOm Structure Launchpad competition in June 2014 and TechCrunch Discord Europe in October 2014.

The first version was released in September 2016 The second and enterprise versions released in May 2017.

Concurrency Control[10]

Optimistic Concurrency Control (OCC)

Optimistic concurrency control is implemented using rows’ sequence number (_seq_no) and primary term (_primary_term). Initially each row’s sequence number is 0, and is incremented with every INSERT/UPDATE/DELETE to its shard (partition of the table). The primary term is incremented when a shard becomes primary. When updating or deleting, the query must be done with the correct sequence number and primary term, otherwise no effect will take place. CrateDB does not support transactions.

Foreign Keys[11]

Not Supported

CrateDB does not support foreign keys.

Isolation Levels[12][13]

Read Uncommitted Read Committed Serializable Repeatable Read

CrateDB supports four isolation levels: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, and SERIALIZABLE. The isolation level is specified in the begin statement as a transaction_mode.

Joins[14]

Nested Loop Join Hash Join

CrateDB supports two join algorithms: nested loop joins and block hash joins. By default the system uses nested loop joins. Block hash joins can only be applied on inner joins where the condition meets the following criteria: it contains at least one equal operator and no or operators and every argument of an equal operator can only reference fields from one relation. The hash join algorithm can be enabled or disabled explicitly.

The system supports cross joins, inner joins, and left/right/full outer joins, and its performance is limited when joining two or more tables (resulting in poor execution plans).

Logging[15][16][17][18]

Logical Logging

Each shard has its own write ahead log (called the translog), which gets flushed to the index storage of Lucene when the log is full. The translog stores records of the operations on nodes. Operations store metadata about the query that was executed (such as job_id and used_bytes). The job that corresponds to the given job id stores the actual query statement that was run.

Additionally, CrateDB supports application logging (with Log4j) and JVM garbage collection logging. Apache’s Log4J provides logging functionalities for Java. Crate DB uses JVM garbage collection logging to maintain garbage collection times.

Query Compilation[19]

Not Supported

Crate does not support query compilation.

Query Execution[20][21][22]

Materialized Model

Given an operation tree, CrateDB’s Job Execution Service will execute the given plan. Each operation will emit the entire (intermediate) result to other nodes, which will continue the query execution with the given result.

Query Interface[23][24][25]

SQL HTTP / REST Command-line / Shell

CrateDB can be queried with regular SQL. One way to query a table is through the CrateDB admin UI, in which the user can construct and view the results of a query. Another way is through CrateDB Shell (called Crash), which is their custom command-line shell. An HTTP endpoint is also provided for submitting queries; using HTTPie users can query using some endpoint, to which CrateDB will respond with JSON. There are also numerous third party client tools that will work with CrateDB. Since the system supports PostgreSQL wire protocol, most third party tools for PostgreSQL will also be compatible with CrateDB.

Storage Architecture[26]

Disk-oriented

CrateDB stores data on disk, which is retrieved and stored depending on the execution plan. The data is partitioned and stored across many nodes.

Storage Model[27][28]

Hybrid

CrateDB stores given data both in row and column store formats. Column store is enabled by default for primitive types (such as integers or booleans) and cannot be turned off. It is supported for a limited length of text data as well, but can be disabled. It is not supported for other types, specifically compound or geographic types.

Rows of a table are stored as a semi-structured document that can be nested in objects, and operation on these documents are atomic.

Storage Organization[29]

Sorted Files

In CrateDB, the tables are sharded (partitioned) and divided amongst the nodes, and the shard is stored as a Lucene index, which is further broken down to stored as files under a directory of a node. Data is appended to files, never removed, making replication and recovery easier. When writing to a file, the primary node is looked up and the new data is added to the file. The operation is repeated on replicas.

Stored Procedures[11]

Not Supported

CrateDB does not support stored procedures.

System Architecture[30]

Shared-Nothing

CrateDB is a shared nothing distributed system. Every Crate Node has the same four components, making them all equal in terms of functionality. They can all process SQL statements, execute queries, interact with the cluster, and store data. Therefore, any node can receive, process, and execute queries.

A Crate cluster is defined as two different nodes representing the same database instance but running on different hosts. The cluster state includes metadata including global settings, discovered nodes, schemas, and status/location of shards. A single node in the cluster is elected the “Metadata Primary”, and is the only node allowed to change the state at runtime. Discovering a node refers to finding, adding, and removing nodes. A node will ping potential hosts, and after receiving a response including the Metadata Primary, the new node can send a join request to that cluster specifically.

All nodes in a Crate cluster can communicate with any other node in the cluster, via byte serialized POJOs (plain old Java objects). This full mesh topology improves reliability and messages to be sent along the shortest possible path, but is limited in growth.

Views[31][32]

Virtual Views

CrateDB supports creating, querying, and dropping views. The view is not materialized, so the query associated with this view is rerun every time the view is used. The enterprise version allows different users to have privileges, so to query a view the user must have DQL privileges on the view. The user who created the view automatically has DQL privileges on all the relations in the view.

Citations

32 sources

CrateDB — Real-Time Analytics Database | SQL at Any Scale cratedb.com Accessed: 2026-06-02
GitHub - crate/crate: CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is PostgreSQL-compatible, and based on Lucene. · GitHub github.com Accessed: 2026-06-03
CrateDB: Guide cratedb.com Accessed: 2026-06-02
CrateDB - Wikipedia wikipedia.org Modified: 2026-04-04 Accessed: 2026-06-04
https://twitter.com/cratedb twitter.com
https://thenewstack.io/designed-cratedb-realtime-sql-dbms-internet-things/ thenewstack.io Modified: 2026-06-05 Accessed: 2026-06-07
CrateDB | Distributed SQL database for real-time analytics at scale cratedb.com Accessed: 2026-06-02
CrateDB System Properties db-engines.com Accessed: 2026-06-07
Crate Lets Developers Set Up Big Data Backends In Minutes | TechCrunch techcrunch.com Accessed: 2026-06-02
Optimistic Concurrency Control - CrateDB: Reference cratedb.com Modified: 2026-05-28 Accessed: 2026-06-02
SQL compatibility - CrateDB: Reference cratedb.com Modified: 2026-05-28 Accessed: 2026-06-02
BEGIN - CrateDB: Reference cratedb.com Accessed: 2026-06-07
Chapter 37 – SQL Transaction Concurrency - SQL 99 readthedocs.io Modified: 2026-04-05 Accessed: 2026-06-07
Joins - CrateDB: Reference cratedb.com Modified: 2026-05-28 Accessed: 2026-06-02
Storage and consistency - CrateDB: Reference cratedb.com Modified: 2026-05-28 Accessed: 2026-06-02
Logging - CrateDB: Reference cratedb.com Modified: 2026-05-28 Accessed: 2026-06-02
Apache Log4j :: Apache Log4j apache.org Modified: 2026-03-28 Accessed: 2026-06-07
System information - CrateDB: Reference cratedb.com Accessed: 2026-06-07
crate/sql/src/main/java/io/crate/planner/operators/LogicalPlan.java at 98e5fe3d911c8ffdf605c7259f738b24ef1c4085 · crate/crate · GitHub github.com Accessed: 2026-05-30
Clustering - CrateDB: Reference cratedb.com Accessed: 2026-06-07
Joins - CrateDB: Reference cratedb.com Accessed: 2026-06-07
crate/devs/docs/architecture.rst at master · crate/crate · GitHub github.com Accessed: 2026-05-30
Installation - CrateDB: Guide cratedb.com Accessed: 2026-06-07
PostgreSQL wire protocol - CrateDB: Reference cratedb.com Accessed: 2026-06-07
HTTP endpoint - CrateDB: Reference cratedb.com Accessed: 2026-06-07
Clustering - CrateDB: Reference cratedb.com Accessed: 2026-06-07
Storage - CrateDB: Reference cratedb.com Accessed: 2026-06-07
Storage and consistency - CrateDB: Reference cratedb.com Modified: 2026-05-28 Accessed: 2026-06-02
https://crate.readthedocs.io/en/stable/architecture/storage_consistency.html readthedocs.io Dead — Check Archive Accessed: 2026-05-31
https://crate.readthedocs.io/en/stable/architecture/shared_nothing.html readthedocs.io Dead — Check Archive Accessed: 2026-05-31
Views - CrateDB: Reference cratedb.com Modified: 2026-05-28 Accessed: 2026-06-02
CREATE VIEW - CrateDB: Reference cratedb.com Modified: 2026-05-28 Accessed: 2026-06-02

Revision #10 Last Updated: 2022-10-28 10:16