Druid

Druid is an open-source distributed real-time data store designed for business intelligence (OLAP) queries. It is optimized for time series scans and aggregations. It supports loading data from both deep storage system like HDFS and streaming sources like Kafka. Internally, Druid uses Zookeeper for cluster node coordinations, a relational database like MySQL or Postgres to keep track of metadata, and a deep storage system such as HDFS for storing data. Druid also has low latency between the event creation and when it can be queried, which makes Druid desirable for real-time analytics. Druid stores incoming data in a unique format called segment to allow fast aggregations for arbitrary dimensionalities of data. It is used by various companies including Netflix, eBay, Airbnb, PayPal and Alibaba.

History

Druid was originally developed by engineers at Metamarkets to solve the problem of analyzing high dimensional data set in real-time. Scan and aggregation of billions of records in traditional relational databases are not fast enough, and pre-computing aggregations with NoSQL architecture requires unacceptably long processing time which creates high latency between event occurrence and its availability for querying. Druid was released in April, 2011 to address the need for fast, real-time analytics for high dimensional time series data. It was open sourced in Oct, 2012 and is under active developments.

Data Model

Column Family

Indexes

B+Tree

Druid index documents into data segment when data are first ingested.

Query Compilation

Not Supported

Query Execution

Tuple-at-a-Time Model

Query Interface

Custom API

Druid uses customized query interface expressed in JSON for metadata, aggregation and search.

Storage Architecture

Hybrid

Druid was built with all in-memory. However such choice is costly given large amount of data. It then switches to use a combination of memory and disk pages and allow users to customize the behavior.

Storage Model

Decomposition Storage Model (Columnar)

Druid uses segments files to stores its index. A segment file is a basically a columnar storage model consists of three basic column types: timestamp columns, dimension columns and metric columns. This structure allows fast aggregation across different fields.

Stored Procedures

Not Supported

Revision #6  | 
Druid Logo
Website

http://druid.io/

Source Code

https://github.com/druid-io/druid/

Developer

Metamarkets

Country of Origin

US

Start Year

2011

Project Type

Open Source

Supported languages

Java

Operating Systems

AIX, BSD, HP-UX, Linux, OS X, Solaris

Licenses

Apache v2

Wikipedia

https://en.wikipedia.org/wiki/Druid_(open-source_data_store)

Revision #6  |