Drill

Viewing Revision #20 from 2026-06-16 11:22 View Current

Drill is a database system designed for Big Data exploration. It is an open-source, distributed SQL query system based on Google's Dremel query system, and it features a columnar execution engine. Drill is the only distributed SQL engine in the world that does not require schemas. It supports the querying of many NoSQL databases and file systems, with the ability to join data from different types of datastores within the same query.[04][05][01][06]

Logo Versions

Website: https://drill.apache.org[01]
Source Code: https://github.com/apache/drill[02] Accessed: Jun 24, 2026 Last Commit: Jun 24, 2026
Tech Docs: https://drill.apache.org/docs/[03]
Twitter: @ApacheDrill
Developer: Apache Software Foundation
Governance: Apache Software Foundation
Country of Origin: US
Start Year: 2012 [20]
Coding Agent: Claude [21]
Project Type: Open Source
Written in: Java
Supported Languages: C, C#, C++, Java, JavaScript, Perl, Python
Inspired By: BigQuery
Compatible With: HBase, MongoDB
Operating Systems: Linux, macOS, Windows
License: Apache v2
Wikipedia: https://en.wikipedia.org/wiki/Apache_Drill[04]

Database Entry

Drill

Viewing Revision #20 from 2026-06-16 11:22 View Current

AI-Assisted OLAP

History[07][08][04]

In 2010, Google published a paper titled "Dremel: Interactive Analysis of Web-Scale Datasets" that described a scalable database system designed for "interactive analysis of nested data". This Dremel query engine is now available today under Google's BigQuery system. Development of Apache Drill began in 2012, with the goal of replicating the capabilities of Dremel. Initial goals of the system included support for multiple storage systems, file formats, query languages, and data sources, as well as the ability to scale over 10,000 servers and process petabytes of data in seconds.

Checkpoints[09]

Not Supported

Drill adopts optimistic query execution, which assumes that failures occur rarely during queries. Therefore, it does not take checkpoints. With its pipelined query execution model, single queries are simply reran when they fail.

Concurrency Control[10]

Optimistic Concurrency Control (OCC)

Drill supports Optimistic Concurrency Control. It plans queries in fragments, assuming that all of the fragments can be completed in parallel without interfering with each other. Larger fragments are broken into smaller fragments, which are run in clusters until the whole fragment is complete.

Data Model[01][11][12]

Column Family / Wide-Column

Drill features a JSON self-describing data model that supports language independence and loosely defined, weak data typing. This data model uses on-the-fly schema discovery, also known as late binding, to begin the execution of queries without having to know the structure of the data. Through this data model, Drill can handle data with evolving schemas or even no schemas at all.

Drill's internal data representation is columnar and hierarchical, which allows for efficient SQL processing without the need to flatten data into rows. The data model supports queries on complex/nested data as well as evolving data structures.

Foreign Keys[13][14]

Supported

Drill supports the usage of foreign keys within the schemas of the datastores that it gathers data from.

Indexes[15][16]

Drill's query planner is able to leverage indexes to create index-based query plans for better performance. It works with index types that are supported by the MapR database system. These include simple/composite indexes, hashed/non-hashed indexes, and covering indexes.

Joins[17]

Nested Loop Join Hash Join Sort-Merge Join Broadcast Join

Drill makes use of both distributed and broadcast joins to perform hash, merge, and nested loop join operations. In distributed joins, both sides of the join are hash distributed on one or more join keys. In broadcast joins, one side of the join is broadcasted to all other nodes in the join.

Parallel Execution[10]

Intra-Operator (Horizontal)

Drill supports parallel execution through intra-operator parallelism. Physical plans are split into phases called fragments. Large fragments, known as major fragments, are then split into minor fragments. The minor fragments run in parallel with each other, each in their own thread, until the major fragments, and eventually the plan, is fully complete.

Query Compilation[12]

Code Generation

Drill supports code generation and runtime query compilation. In fact, Drill is the only query engine in the world that both compiles and re-compiles queries at runtime, as part of its on-the-fly schema discovery that allows it to begin executing queries without knowing the structure of the data. Through this mechanism, queries are compiled during their execution phase.

Query Execution[10]

Vectorized Model

Drill supports a vectorized model of distributed query execution. This is due to its parallel execution design, in which queries are broken up into fragments of work. Together, these fragments compose a multi-level execution tree. The root fragment reads queries and table metadata, and then routes them to lower levels in the execution tree for execution to be processed in parallel. Partial results are passed up the tree level by level, with higher-level fragments performing further aggregation of these results, up until the completion of entire queries.

Query Interface[18]

SQL HTTP / REST

Drill supports the running of queries via industry standard ANSI SQL and RESTful APIs. Since data in the queries may be schema-less, Drill will implicitly convert this data into correctly-typed data in some cases. In other cases, casting the data is necessary, and the need for casting varies by query and data source.

Storage Architecture[09][12]

Hybrid

The overall storage architecture of Drill is a hybrid of in-memory and disk. Drill optimizes for columnar storage and execution by utilizing an in-memory data model that is both hierarchical and columnar. Query execution happens in memory as much as possible, persisting to disk only when there is memory overflow.

Storage Model[09][12]

Decomposition Storage Model (Columnar)

Drill supports an in-memory columnar data storage model. The design of this model allows for the optimization of in-memory, parallel query execution. It also provides an execution layer that performs SQL processing on this columnar data without the need for row materialization.

Stored Procedures[19]

Not Supported

Drill does not support stored procedures.

System Architecture[10][09]

Shared-Nothing

The core of Apache Drill is the "Drillbit" service. Drillbits are processes that accept query requests, execution queries, and return the results to the client. This query process is split over clusters of multiple Drillbits, which follow the shared-nothing architecture. Each Drillbit has its own cache and storage engine.

When a query is issued, any one of the Drillbits in the cluster can accept it, turning this Drillbit into the "driving Drillbit". The driving Drillbit will parse the query optimize the query, and then generate a distributed query plan over the available Drillbits in the cluster. Together, these will each execute according to the plan and return data to the driving Drillbit, which ultimately returns the result to the client.

Citations

21 sources

https://drill.apache.org apache.org Dead — Check Archive Accessed: 2026-06-04
GitHub - apache/drill: Apache Drill is a distributed MPP query layer for self describing data · GitHub github.com Accessed: 2026-06-04
Documentation - Apache Drill apache.org Modified: 2025-06-28 Accessed: 2026-06-05
Apache Drill - Wikipedia wikipedia.org Modified: 2026-04-27 Accessed: 2026-06-04
Drill Introduction - Apache Drill apache.org Modified: 2025-06-28 Accessed: 2026-06-07
https://mapr.com/products/apache-drill mapr.com Dead — Check Archive Accessed: 2026-05-31
DrillProposal - INCUBATOR - Apache Software Foundation apache.org Accessed: 2026-06-07
The Apache Software Foundation Announces Apache™ Drill™ as a Top-Level Project - The ASF Blog apache.org Modified: 2026-06-07 Accessed: 2026-06-07
Architecture - Apache Drill apache.org Modified: 2025-02-12 Accessed: 2026-06-07
Drill Query Execution - Apache Drill apache.org Modified: 2025-06-28 Accessed: 2026-06-07
JSON Data Model - Apache Drill apache.org Modified: 2025-06-28 Accessed: 2026-06-07
Frequently Asked Questions - Apache Drill apache.org Modified: 2025-02-12 Accessed: 2026-06-07
LATERAL Join - Apache Drill apache.org Modified: 2025-06-28 Accessed: 2026-06-07
[DRILL-4391] browsing metadata via SQLSquirrel shows Postgres indexes, primary and foreign keys as tables - ASF Jira apache.org Accessed: 2026-06-07
Types of Indexes - Apache Drill apache.org Modified: 2025-06-28 Accessed: 2026-06-07
https://mapr.com/docs/61/MapR-DB/Indexes/indexes-types.html mapr.com Dead — Check Archive Accessed: 2026-05-31
Join Planning Guidelines - Apache Drill apache.org Modified: 2025-06-28 Accessed: 2026-06-07
Query Data Introduction - Apache Drill apache.org Modified: 2025-06-28 Accessed: 2026-06-07
Apache Mail Archives apache.org Modified: 2024-06-04 Accessed: 2026-06-08
First commit github.com Modified: 2012-09-03 Accessed: 2026-05-27
https://github.com/apache/drill/commit/a1986a3fec1634812712e47be0be2565b303ea2d github.com Modified: 2019-04-08 Accessed: 2026-06-25

Revision #20 Last Updated: 2026-06-16 07:22