Impala

Viewing Revision #5 from 2018-10-28 21:08 View Current

Impala is an open source SQL engine that offers interactive query processing on data stored in Apache Hadoop file formats. As opposed to SQL-on-Hadoop databases such as Hive that are used for long batch jobs, Impala enables interactive exploration and fine-tuning analytic queries by using its Massively Parallel Process (MPP) model. Impala avoids data movement and enables the users to interact with the data stored in HDFS via a SQL front-end rather than the traditional HDFS jobs.[04][05]

Logo Versions

Website: https://impala.apache.org/[01]
Source Code: https://github.com/apache/impala[02] Accessed: Jul 29, 2026 Last Commit: Jul 29, 2026
Developer: Cloudera, Inc.
Country of Origin: US
Start Year: 2013 [27]
Project Types: Commercial, Open Source
Written in: C++
Operating System: Linux
License: Apache v2
Wikipedia: https://en.wikipedia.org/wiki/Apache_Impala[03]

Logo Versions

Website: https://impala.apache.org/[01]
Source Code: https://github.com/apache/impala[02] Accessed: Jul 29, 2026 Last Commit: Jul 29, 2026
Developer: Cloudera, Inc.
Country of Origin: US
Start Year: 2013 [27]
Project Types: Commercial, Open Source
Written in: C++
Operating System: Linux
License: Apache v2
Wikipedia: https://en.wikipedia.org/wiki/Apache_Impala[03]

Compatible Systems

Kudu

Derivative Systems

StarRocks

Impala

Viewing Revision #5 from 2018-10-28 21:08 View Current

History[06][07]

The Impala project was announced in October 2012 with the objective to provide a SQL interface and Business Intelligence tools for data scientists. Impala supports various HDFS file formats, however it is optimized for Parquet, a column-oriented file format which was announced in early 2013. Impala was accepted into the Apache incubator on December 2, 2015.

Checkpoints[08][09]

Checkpoints of a query are not supported in Impala. When a host node on which a query was running fails, Impala cancels the query. Additional support for long running queries will be added in the future so that a query could complete even in the presence of node failures.

Concurrency Control[10]

Not Supported

Impala does not support any Concurrency control mechanism. The transactional nature of the HiveMetaStore (HMS), which receives updates on inserts and updates raises an error incase parallel inserts are made into the same table.

Data Model[11]

Relational

Impala is a massively parallel query engine which is not strongly coupled with the underlying storage layer. Currently, impala only supports a flat relational schema. They plan to add support for nested schemas with complex column types.

Foreign Keys[12][13]

Although Foreign Keys are not supported by Impala currently, they will be added later for cardinality estimation during query planning. However, they will not be enforced by Impala.

Indexes[14][15]

Not Supported

Impala does not support indexes. Although HIVE provides limited index capabilities, they are not leveraged by Impala. Since Impala is not a monolithic DBMS, Impala is often unaware of the data the shows up in the HDFS files. Hence it is not possible for the index to stay in sync with the base data.

Isolation Levels[16]

Read Uncommitted Read Committed

Impala supports both Read Committed and Read Uncommitted isolation levels.

Joins[17]

Nested Loop Join Hash Join Broadcast Join Shuffle Join

Impala provides a variety of Join Options. Impala does not provide a command to hint on the type of join to be executed incase of Nested Loop Joins and Hash Joins. Impala internally decied on the most suitable join mechanism for the query. However, it supports query hints for choosing between Broadcast and Shuffle joins.

Logging[18]

Since Impala does not support transactions and is suited for analytical queries, it does not support logging.

Query Compilation[11]

Code Generation JIT Compilation

Impala uses the LLVM engine to perform just in time (JIT) query compilation. It uses runtime code generation for specific versions of the function by which performance improvements of more than 5x are achieved.

Query Execution[11][19]

Tuple-at-a-Time Model

Query Interface[20][21]

Custom API SQL

Impala supports SQL as its query language. It provides a high dgree of compatibility with the Hive Query Language (HiveQL). Additionally it also provides an impala-shell interpreter which processes all the SQL commands supported by Impala along with a few shell-only commands which can be used for performance tuning.

Storage Architecture[22][11]

Disk-oriented

Impala can access data stored on HDFS in any of the Apache Hadoop file formats, including, Parquet, Text, Avro, RCFile and SequenceFile. It also supports compressed file formats in order to reduce the disk space and I/O volume, although such formats induce a CPU overhead to decompress the data.

Storage Model[11][23]

Custom

Impala does not provide its own storage engine but rather reads data from any of the underlying storage format. Nonetheless, when data is stored in Parquet, a binary columnar storage format, it displays significant performance improvement as it substantially reduces the I/O volume.

Stored Procedures[24]

Supported

Support for stored procedures in Impala was added from the 1.2 release. It now enables users to write UDFs in C++ or Java based Hive UDFs. C++ UDFs achieve a significant performance improvement over the Java written UDFs. Currently support for User Defined Table Functions (UDTF) has not been added.

System Architecture[25][26]

Shared-Nothing

Impala is a distributed, Massively Parallel Processing (MPP) query engine which uses a Shared-Nothing architecture. Impala consists of the following three major components 1. Impala Daemon - A daemon process runs on each data node to read and write data for the accepted queries and parallelizes the work across the cluster. It transmits the query results to the central coordinator node. 2. Impala Statestore - It is a daemon process which continously monitors the health status of the daemons on the datanodes in the cluster. When a datanode goes down, it ensures that no requests are made to an unreachable datanode. It provides robustness, load balancing and high availability. 3. Impala Catalog Service - It relays the metadata changes from SQL statements to all the Imapala Daemons. The catalog server ensures that if the metadata change occured via SQL queries issued through Impala.

Views[09]

Virtual Views

Imapala supports virtual views as lightweight logical constructs to act as query aliases. It does not support materialized views since data updates in the Hadoop Environment make it difficult to keep them up-to date.

Compatible Systems

Kudu

Derivative Systems

StarRocks

Citations

27 sources

Impala apache.org Modified: 2024-08-12 Accessed: 2026-07-16
GitHub - apache/impala: Apache Impala · GitHub github.com Accessed: 2026-06-04
Apache Impala - Wikipedia wikipedia.org Modified: 2025-12-30 Accessed: 2026-06-04
https://docs.cloudera.com/documentation/enterprise/latest/topics/impala.html cloudera.com Dead — Check Archive Modified: 2026-06-05 Accessed: 2026-06-07
Introducing Apache Impala apache.org Modified: 2025-03-04 Accessed: 2026-06-07
Apache Impala - Wikipedia wikipedia.org Modified: 2025-12-30 Accessed: 2026-06-04
Introducing Apache Impala apache.org Modified: 2025-03-04 Accessed: 2026-06-07
Impala Frequently Asked Questions | 5.2.x | Cloudera Documentation cloudera.com Modified: 2025-08-18 Accessed: 2026-06-07
https://docs.cloudera.com/documentation/enterprise/latest/topics/impala_faq.html cloudera.com Dead — Check Archive Modified: 2026-06-05 Accessed: 2026-06-07
Archived - Cloudera Community cloudera.com Accessed: 2026-06-07
Impala: A Modern, Open-Source SQL Engine for Hadoop cidrdb.org Modified: 2014-12-01 Accessed: 2026-06-07
https://issues.apache.org/jira/browse/IMPALA-2112 apache.org Accessed: 2026-06-07
https://docs.cloudera.com/documentation/enterprise/latest/topics/impala_reserved_words.html cloudera.com Dead — Check Archive Modified: 2026-06-05 Accessed: 2026-06-07
Impala Frequently Asked Questions | 5.6.x | Cloudera Documentation cloudera.com Modified: 2025-08-18 Accessed: 2026-06-07
Google Groups google.com Accessed: 2026-06-07
impala/thirdparty/hbase-0.94.6-cdh4.3.0/src/main/java/org/apache/hadoop/hbase/client/IsolationLevel.java at master · schubertzhang/impala · GitHub github.com Accessed: 2026-05-20
Joins in Impala SELECT Statements | 5.9.x | Cloudera Documentation cloudera.com Modified: 2025-08-18 Accessed: 2026-06-07
https://www.tutorialspoint.com/impala/impala_overview.html tutorialspoint.com Dead — Check Archive Accessed: 2026-06-07
https://chatwithengineers.com/2016/08/29/a-survey-of-query-execution-engines-from-volcano-to-vectorized-processing chatwithengineers.com Dead — Check Archive Accessed: 2026-06-07
https://docs.cloudera.com/documentation/enterprise/latest/topics/impala_langref.html cloudera.com Dead — Check Archive Modified: 2026-06-05 Accessed: 2026-06-07
https://docs.cloudera.com/documentation/enterprise/latest/topics/impala_impala_shell.html cloudera.com Dead — Check Archive Modified: 2026-06-05 Accessed: 2026-06-07
https://docs.cloudera.com/documentation/enterprise/latest/topics/impala_file_formats.html cloudera.com Dead — Check Archive Modified: 2026-06-05 Accessed: 2026-06-07
Latest Insights on Data and AI | Cloudera Blog cloudera.com Modified: 2026-06-07 Accessed: 2026-06-07
https://docs.cloudera.com/documentation/enterprise/latest/topics/impala_udf.html cloudera.com Dead — Check Archive Modified: 2026-06-05 Accessed: 2026-06-07
Impala Concepts and Architecture apache.org Modified: 2025-03-04 Accessed: 2026-06-07
Impala apache.org Modified: 2024-08-12 Accessed: 2026-06-07
Impala Version and Download Information | 5.x | Cloudera Documentation cloudera.com Modified: 2025-08-18 Accessed: 2026-06-07

Revision #5 Last Updated: 2018-10-28 17:08