Drill

Viewing Revision #10 from 2019-12-11 08:28 View Current

Drill is a database system designed for Big Data exploration. It is an open-source, distributed SQL query system based on Google's Dremel query system, and it features a columnar execution engine. Drill is the only distributed SQL engine in the world that does not require schemas. It was designed from the ground up, and it supports many NoSQL databases and file systems, with the ability for a single query to join data from multiple types of datastores, like MongoDB or HBase for instance.[05][06][04][01]

Logo Versions

Website: https://drill.apache.org[01]
Source Code: https://github.com/apache/drill[02] Accessed: Jul 17, 2026 Last Commit: Jul 14, 2026
Tech Docs: https://drill.apache.org/docs/[03]
Developer: Apache Software Foundation
Country of Origin: US
Start Year: 2012 [18]
Project Type: Open Source
Written in: Java
Supported Languages: SQL
Inspired By: BigQuery
Compatible With: HBase, MongoDB
Operating Systems: Linux, macOS, Windows
License: Apache v2
Wikipedia: https://en.wikipedia.org/wiki/Apache_Drill[04]

Logo Versions

Website: https://drill.apache.org[01]
Source Code: https://github.com/apache/drill[02] Accessed: Jul 17, 2026 Last Commit: Jul 14, 2026
Tech Docs: https://drill.apache.org/docs/[03]
Developer: Apache Software Foundation
Country of Origin: US
Start Year: 2012 [18]
Project Type: Open Source
Written in: Java
Supported Languages: SQL
Inspired By: BigQuery
Compatible With: HBase, MongoDB
Operating Systems: Linux, macOS, Windows
License: Apache v2
Wikipedia: https://en.wikipedia.org/wiki/Apache_Drill[04]

Drill

Viewing Revision #10 from 2019-12-11 08:28 View Current

History[07][08][04]

In 2010, Google published a paper titled "Dremel: Interactive Analysis of Web-Scale Datasets" that described a scalable database system designed for "interactive analysis of nested data". The Dremel system is available today under Google's BigQuery system. Development of Apache Drill began in 2012, with the goal of replicating the capabilities of Dremel. Initial goals of the system included support for multiple storage systems, file formats, query languages, and data sources, as well as the ability to scale over 10,000 servers and process petabytes of data in seconds.

Checkpoints[09]

Not Supported

Drill adopts optimistic query execution, which assumes that failures occur rarely during queries. Therefore, it does not take checkpoints. With its pipelined query execution model, single queries are simply reran when they fail.

Concurrency Control[10]

Optimistic Concurrency Control (OCC)

Drill supports Optimistic Concurrency Control. It plans queries in fragments, assuming that all of the fragments can be completed in parallel without interfering with each other. Larger fragments are broken into smaller fragments, which are run in clusters until the whole fragment is complete.

Data Model[11][12][01]

Column Family / Wide-Column

Drill features a JSON self-describing data model that supports language independence and loosely defined, weak data typing. This data model uses on-the-fly schema discovery, also known as late binding, to begin the execution of queries without having to know the structure of the data. Through this data model, Drill can handle data with evolving schemas or even no schemas at all.

Drill's internal data representation is columnar and hierarchical, which allows for efficient SQL processing without the need to flatten data into rows. The data model supports queries on complex/nested data as well as evolving data structures.

Foreign Keys[13][14]

Supported

Drill supports the usage of foreign keys within the schemas of the datastores that it gathers data from.

Indexes[15][16]

Drill's query planner is able to leverage indexes to create index-based query plans for better performance. It works with index types that are supported by the MapR database system. These include simple/composite indexes, hashed/non-hashed indexes, and covering indexes.

Joins[17]

Nested Loop Join Hash Join Sort-Merge Join Broadcast Join

Drill makes use of both distributed and broadcast joins to perform hash, merge, and nested loop join operations. In distributed joins, both sides of the join are hash distributed on one or more join keys. In broadcast joins, one side of the join is broadcasted to all other nodes in the join.

Parallel Execution[10]

Intra-Operator (Horizontal)

Drill supports parallel execution through intra-operator parallelism. Physical plans are split into phases called fragments. Large fragments, known as major fragments, are then split into minor fragments. The minor fragments run in parallel with each other, each in their own thread, until the major fragments, and eventually the plan, is fully complete.

Query Compilation[12]

Code Generation

Drill supports code generation and runtime query compilation. In fact, Drill is the only query engine in the world that both compiles and re-compiles queries at runtime, as part of its on-the-fly schema discovery that allows it to begin executing queries without knowing the structure of the data. Through this mechanism, queries are compiled during their execution phase.

Query Execution[10]

Vectorized Model

Drill supports a vectorized model of query execution. This is due to its parallel execution design, in which queries are broken up into fragments of work. Together, these fragments compose a multi-level execution tree. The root fragment reads queries and table metadata, and then routes them to lower levels in the execution tree for execution to be processed in parallel. Partial results are passed up the tree level by level, with higher-level fragments performing further aggregation of these results, up until the completion of the entire queries.

Query Interface

SQL

System Architecture

Shared-Disk

Citations

18 sources

https://drill.apache.org apache.org Dead — Check Archive Accessed: 2026-06-04
GitHub - apache/drill: Apache Drill is a distributed MPP query layer for self describing data · GitHub github.com Accessed: 2026-06-04
Documentation - Apache Drill apache.org Modified: 2025-06-28 Accessed: 2026-06-05
Apache Drill - Wikipedia wikipedia.org Modified: 2026-04-27 Accessed: 2026-06-04
Drill Introduction - Apache Drill apache.org Modified: 2025-06-28 Accessed: 2026-06-07
https://mapr.com/products/apache-drill mapr.com Dead — Check Archive Accessed: 2026-05-31
DrillProposal - INCUBATOR - Apache Software Foundation apache.org Accessed: 2026-06-07
The Apache Software Foundation Announces Apache™ Drill™ as a Top-Level Project - The ASF Blog apache.org Modified: 2026-06-07 Accessed: 2026-06-07
Architecture - Apache Drill apache.org Modified: 2025-02-12 Accessed: 2026-06-07
Drill Query Execution - Apache Drill apache.org Modified: 2025-06-28 Accessed: 2026-06-07
JSON Data Model - Apache Drill apache.org Modified: 2025-06-28 Accessed: 2026-06-07
Frequently Asked Questions - Apache Drill apache.org Modified: 2025-02-12 Accessed: 2026-06-07
LATERAL Join - Apache Drill apache.org Modified: 2025-06-28 Accessed: 2026-06-07
[DRILL-4391] browsing metadata via SQLSquirrel shows Postgres indexes, primary and foreign keys as tables - ASF Jira apache.org Accessed: 2026-06-07
Types of Indexes - Apache Drill apache.org Modified: 2025-06-28 Accessed: 2026-06-07
https://mapr.com/docs/61/MapR-DB/Indexes/indexes-types.html mapr.com Dead — Check Archive Accessed: 2026-05-31
Join Planning Guidelines - Apache Drill apache.org Modified: 2025-06-28 Accessed: 2026-06-07
First commit github.com Modified: 2012-09-03 Accessed: 2026-05-27

Revision #10 Last Updated: 2019-12-11 03:28