PrestoDB

Viewing Revision #18 from 2021-01-05 01:03 View Current

PrestoDB is an open source distributed SQL query engine for running interactive analytic queries against heterogeneous data sources. It was open sourced by Facebook in 2013. It does not manage the storage of data. Instead, Presto is a query engine which allows querying data where it lives, including Hive, Cassandra, Kafka, and relational databases. A single PrestoDB query is able to combine data from multiple sources. Presto was designed, built and optimized for interactive queries. In comparison, both Presto and Hive support SQL queries against HDFS, while Presto is targeted at interactive queries and Hive is suitable for batch processing. Presto supports ANSI-compatible SQL statements.[05][06]

Logo Versions

Website: https://prestodb.io[01]
Source Code: https://github.com/prestodb/presto[02] Accessed: Jul 27, 2026 Last Commit: Jul 25, 2026
Tech Docs: https://prestodb.io/docs/current[03]
Developer: Presto Open Source Community
Country of Origin: US
Start Year: 2013 [21]
Project Type: Open Source
Written in: Java
Supported Languages: SQL
Compatible With: Accumulo, Cassandra, Elasticsearch, Hive, Kudu, MongoDB, MySQL, Pinot, PostgreSQL, Redis, Redshift
Operating System: All OS with Java VM
License: Apache v2
Wikipedia: https://en.wikipedia.org/wiki/Presto_(SQL_query_engine)[04]

Database Entry

PrestoDB

Viewing Revision #18 from 2021-01-05 01:03 View Current

History[07][06]

PrestoDB started out as a project at Facebook in 2012 and it was internally launched to the company in early 2013. Facebook open-sourced Presto in November 2013 under the Apache Software License.

Checkpoints[08]

Not Supported

Presto does not support checkpointing, or any other forms of fault-tolerance as of late 2018. As is the case for many OLAP database management systems, Presto assumes that the nodes will not fail during the execution. The client has to rerun the entire query once failure occurs.

Compression[08]

Dictionary Encoding Run-Length Encoding

Presto can operate on dictionary and run-length-encoded blocks from connectors. When generating intermediate results, Presto also produces compressed data in the form of dictionary or run-length-encoded blocks.

Concurrency Control[09][08]

Not Supported

Presto is a query engine designed for query large amounts of data, and it does not support updating the data that it queries. As a result, concurrency control is not needed for the usages of Presto.

Data Model[10]

Column Family / Wide-Column Relational Key-Value Document / XML

Presto provides connectors to different data sources, and each connector is implemented to be compatible with the data model of the underlying database. Thus, Presto supports different types of data models, including column family, document/XML, key/value and relational.

Indexes[07]

Not Supported

Presto does not support indexes. All intermediate processing and storage are done in the memory to avoid unnecessary I/O overheads.

Isolation Levels[11]

Read Uncommitted Read Committed Serializable Repeatable Read

Depending on the underlying source of data, whether or not transaction is supported depends on the implementation of the specific connector. For the connectors that support transactions, Presto API supports 4 different types of isolation levels. The isolation level is to be specified when a transaction is started.

Joins[12]

Hash Join Broadcast Join Shuffle Join

Presto has two types of join distributions. It can support both broadcast join and partitioned (shuffle) join. The join distribution can either be specified by the user or be decided by the cost-based optimization strategies that are supported by Presto.

At each node level, Presto performs a hash-based join.

Logging[13]

Not Supported

By default, Presto does not support logging. Customized tools can be added to Presto to log the events and queries at the node level.

Query Compilation[08]

Code Generation JIT Compilation

Presto uses code generation targeting JVM bytecode. To do this, it can evaluate expressions, and also use heuristics to generate code that are compatible with the optimization of the JIT compilers, thus providing better performance.

Query Execution[08]

Materialized Model

Presto uses the materialized model and emits the entire result at a time. The result is then stored in the memory and the coordinator decides its usage.

Query Interface[08]

SQL

Presto follows the ANSI SQL specification. To improve usability, Presto also supports anonymous functions and built-in higher-order functions, including transform, filter and reduce.

Storage Architecture[14][15][08]

In-Memory

Presto is designed to query data from sources including Hadoop environments and other relational database systems, so it does not directly take the role of data storage. All data and the intermediate results are stored in-memory whenever possible. For communication between nodes, data is also stored in in-memory buffers and sent through the network. This avoids the high cost of I/O operations and speeds up the execution.

For memory-intensive queries, Presto also offers the functionality of spilling data to disk. But this is not a primary function of Presto and it is assumed that most of the query operations should be performed in-memory completely.

Storage Model[08]

Decomposition Storage Model (Columnar)

To execute a query, Presto splits the assignments to each worker, and the workers fetch the data from the data sources. The unit of data that Presto locally operates on is called a page. The page is a columnar of a sequence of rows.

Stored Procedures[16]

Supported

Stored procedures may be provided by the connectors to data sources. Presto provides a "call" statement to invoke a procedure. However, some systems, such as PostgreSQL, have their own stored procedures that are not directly callable by Presto.

System Architecture[17][08][18][19]

Shared-Nothing

As similar to many classic MPP (massively parallel process) database management systems, Presto utilizes a shared-nothing system architecture.

Presto is deployed on a cluster of nodes. A node can take the role of either a coordinator or a worker. Each node has its own private disk and memory, and the user can configure the memory usage of each node. Since Presto does not store data directly, the disk of each node is used minimally for storing logs only, and all communications are done through the network.

Views[20]

Virtual Views

Presto provides functions to store the view as a logical table that can be used by future queries. The view does not contain any data, meaning that the query that generates the view has to be executed when the view is referenced.

Compatible Systems

TileDB

Derivative Systems

Trino

GeoSpock DB

CrateDB

Citations

21 sources

https://prestodb.io prestodb.io Spam — Check Archive Accessed: 2026-06-05
GitHub - prestodb/presto: The official home of the Presto distributed SQL query engine for big data · GitHub github.com Accessed: 2026-06-05
https://prestodb.io/docs/current prestodb.io Accessed: 2026-06-05
Presto (SQL query engine) - Wikipedia wikipedia.org Modified: 2026-01-05 Accessed: 2026-06-04
https://blog.treasuredata.com/blog/2015/03/20/presto-versus-hive/ treasuredata.com Dead — Check Archive Accessed: 2026-05-25
https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920 facebook.com Dead — Check Archive Accessed: 2026-06-05
https://aws.amazon.com/big-data/what-is-presto amazon.com Dead — Check Archive Accessed: 2026-06-05
https://prestosql.io/Presto_SQL_on_Everything.pdf prestosql.io Dead — Check Archive Accessed: 2026-05-27
https://prestodb.io/docs/current/overview/use-cases.html prestodb.io Accessed: 2026-06-07
https://prestodb.io/docs/current/connector.html prestodb.io Accessed: 2026-06-07
https://prestodb.io/docs/current/sql/start-transaction.html prestodb.io Accessed: 2026-06-07
https://prestodb.io/docs/current/optimizer/cost-based-optimizations.html prestodb.io Accessed: 2026-06-07
https://prestodb.io/presto-admin/docs/current/installation/presto-configuration.html prestodb.io Dead — Check Archive Accessed: 2026-05-30
Using the Spill to Disk Mechanism — Qubole Data Service documentation qubole.com Modified: 2025-01-31 Accessed: 2026-06-07
https://prestodb.io/docs/current/admin/spill.html prestodb.io Accessed: 2026-06-07
https://prestodb.io/docs/current/sql/call.html prestodb.io Accessed: 2026-06-07
https://prestodb.io/docs/current/installation/deployment.html prestodb.io Accessed: 2026-06-07
Apache Presto - Quick Guide tutorialspoint.com Accessed: 2026-06-07
https://prestodb.io/what-is-presto prestodb.io Dead — Check Archive Accessed: 2026-06-05
https://prestodb.io/docs/current/sql/create-view.html prestodb.io Accessed: 2026-06-07
https://www.computerworld.com/article/2485668/business-intelligence/facebook-goes-open-source-with-query-engine-for-big-data.html computerworld.com Dead — Check Archive Accessed: 2026-05-21

Revision #18 Last Updated: 2021-01-04 20:03