Druid

Viewing Revision #14 from 2026-06-16 11:29 View Current

Apache Druid is an open-source distributed real-time analytics database designed for business intelligence (OLAP) queries on streaming and historical data. It is optimized for time series scans and aggregations. It supports loading data from both deep storage system like HDFS and streaming sources like Kafka. Internally, Druid uses Zookeeper for cluster node coordination, a relational database like MySQL or Postgres to keep track of metadata, and a deep storage system such as HDFS for storing data. Druid also has low latency between the event creation and when it can be queried, which makes Druid desirable for real-time analytics. Druid stores incoming data in a unique format called a segment to allow fast aggregations for arbitrary dimensionalities of data. Druid is commonly used to power GUI-based analytical BI apps via JDBC and as a backend for AI apps via REST API. Druid is also used for clickstream analytics, network telemetry analytics, application performance analytics, advertising analytics.[05][06]

Logo Versions

Website: https://druid.apache.org[01]
Source Code: https://github.com/apache/druid[02] Accessed: Jun 24, 2026 Last Commit: Jun 24, 2026
Tech Docs: https://druid.apache.org/docs/latest/design/[03]
Twitter: @druidio
Developer: Metamarkets
Governance: Apache Software Foundation
Country of Origin: US
Start Year: 2011 [07]
Coding Agent: Copilot [17]
Project Type: Open Source
Written in: Java
Supported Languages: Clojure, Java, JavaScript, PHP, Python, R, Ruby, Scala
Operating Systems: AIX, BSD, Hosted, HP-UX, Linux, macOS, Solaris
License: Apache v2
Wikipedia: https://en.wikipedia.org/wiki/Apache_Druid[04]

Database Entry

Druid

Viewing Revision #14 from 2026-06-16 11:29 View Current

AI-Assisted OLAP

History[07][08]

Druid was originally developed by engineers at Metamarkets to solve the problem of analyzing high dimensional data set in real-time. Scan and aggregation of billions of records in traditional relational databases are not fast enough, and pre-computing aggregations with NoSQL architecture requires unacceptably long processing time which creates high latency between event occurrence and its availability for querying. Druid was released in April, 2011 to address the need for fast, real-time analytics for high dimensional time series data. It was open sourced in Oct, 2012 and is under active development.

Concurrency Control[09]

Multi-version Concurrency Control (MVCC)

Data Model

Column Family / Wide-Column

Indexes

B+Tree

Druid index documents into data segment when data are first ingested.

Joins[10]

Sort-Merge Join

Logging[11]

Not Supported

Query Compilation

Not Supported

Query Execution[12]

Tuple-at-a-Time Model Vectorized Model

Query Interface[13]

Custom API SQL HTTP / REST

Druid uses customized query interface expressed in JSON for metadata, aggregation and search. Druid provides support for SQL via Apache Calcite.

Storage Architecture[14][15]

Hybrid

Druid was built with all in-memory. However such choice is costly given large amount of data. It then switches to use a combination of memory and disk pages and allow users to customize the behavior.

Storage Model[16]

Decomposition Storage Model (Columnar)

Druid uses segments files to stores its index. A segment file is a basically a columnar storage model consists of three basic column types: timestamp columns, dimension columns and metric columns. This structure allows fast aggregation across different fields.

Stored Procedures

Not Supported

System Architecture[08]

Shared-Nothing

Views

Not Supported

Citations

17 sources

Apache Druid | Apache® Druid apache.org Modified: 2026-05-08 Accessed: 2026-06-04
GitHub - apache/druid: Apache Druid: a high performance real-time analytics database. · GitHub github.com Accessed: 2026-06-04
Introduction to Apache Druid | Apache® Druid apache.org Modified: 2026-05-08 Accessed: 2026-06-05
Apache Druid - Wikipedia wikipedia.org Modified: 2026-03-31 Accessed: 2026-06-04
Technology | Apache® Druid apache.org Modified: 2026-05-08 Accessed: 2026-06-08
Powered by Apache Druid | Apache® Druid apache.org Modified: 2026-05-08 Accessed: 2026-06-08
https://druid.apache.org/blog/2011/04/30/introducing-druid.html apache.org Dead — Check Archive Accessed: 2026-05-31
http://static.druid.io/docs/druid.pdf druid.io Modified: 2014-03-30 Accessed: 2026-06-07
https://druid.apache.org/blog/2012/10/24/introducing-druid.html apache.org Dead — Check Archive Accessed: 2026-06-07
Joins | Apache® Druid apache.org Modified: 2025-12-15 Accessed: 2026-06-08
https://druid.apache.org/docs/latest/dependencies/deep-storage.html apache.org Dead — Check Archive Accessed: 2026-06-07
Query execution | Apache® Druid apache.org Modified: 2025-12-15 Accessed: 2026-06-08
Druid SQL overview | Apache® Druid apache.org Modified: 2025-12-15 Accessed: 2026-06-08
Introduction to Apache Druid | Apache® Druid apache.org Modified: 2025-12-15 Accessed: 2026-06-08
Frequently Asked Questions | Apache® Druid apache.org Modified: 2025-12-15 Accessed: 2026-06-08
Segments | Apache® Druid apache.org Modified: 2025-12-15 Accessed: 2026-06-07
https://github.com/apache/druid/commit/3d1e0d03de341b41027dfe3d493800274f5e895c github.com Modified: 2026-03-30 Accessed: 2026-06-25

Revision #14 Last Updated: 2026-06-16 07:29