DuckDB is a relational embeddable analytical DBMS that focuses on supporting analytical query workloads (OLAP). Similar to SQLite, DuckDB prioritizes simplicity and ease of integration by eliminating external dependencies for compilation and run-time. DuckDB's columnar-vectorized query execution engine reduces the CPU cycles expended per individual value, processing large batches of values in one operation as a vector. This design choice optimizes DuckDB for analytical queries, and in embedded scenarios, it enables high-speed data transfer to and from the database.
To provide transactional guarantees (ACID properties), DuckDB employs its custom bulk-optimized multi-version concurrency control (MVCC) and enables data to be stored in persistent, single-file databases. Additionally, DuckDB supports secondary indexes to speed up queries trying to find a single table entry.
DuckDB was created in response to the need for a high-performing, analytical DBMS that could be easily embedded within other processes, specifically targeting data science workloads. The project was initiated in 2018 by Mark Raasveldt, a PhD student from CWI, and his supervisor Hannes Mühleisen. Although not heavily involved in the codebase of DuckDB, Peter Boncz, the inventor of Vectorwise, provided valuable advice and insights to the creators. They were inspired by the fact that many data scientists were re-implementing database systems using Python and R, which could be greatly improved by incorporating research in database fields such as optimizer and parallelism.
The core idea behind DuckDB was to retain the simplicity and ease-of-use of SQLite, while enhancing it with fast analytical processing and fast data transfer between R/Python and the RDBMS for OLAP workloads.
The project was named "DuckDB" because the creators believed that ducks were resilient and could live off anything, similar to how they envisioned their database system operating. Additionally, Hannes had a pet duck named Wilbur who inspired the name.
Dictionary Encoding Delta Encoding Run-Length Encoding Bit Packing / Mostly Encoding
DuckDB supports efficient lightweight compression that is automatically used to keep data size down without incurring high costs for compression and decompression. The DBMS implements several compression algorithms, such as constant encoding, run-length encoding, bit packing, frame reference, dictionary encoding, fast static symbol table (FSST), and Chimp and Patas. These algorithms are especially useful for column store formats, such as DuckDB's native file format or Parquet.
Constant encoding is used when every single value in a column segment is the same value, in which case only that single value is stored. Run-length encoding (RLE) is used when there are many repeating values in the data. Bit packing keeps track of the maximum value for every 1024
values and determines the bit packing width based on the maximum value. Frame reference is an extension of bit packing that also includes a minimum value found in the set of values and is particularly powerful when storing dates and timestamps. Dictionary encoding is used when storing text columns with many duplicate entries, while FSST is an extension to dictionary compression that extracts repetitions of entire strings as well as repetitions within strings. Finally, Chimp and Patas are used to compress floating-point values.
The PRAGMA storage_info
command can be used to inspect the storage layout of tables and columns. In 2023, DuckDB added delta and delta-constant compression algorithms that are effective when compressing values that are equally spaced out.
Multi-version Concurrency Control (MVCC)
DuckDB supports transactions and provides transactional guarantees (ACID properties) through its custom, multi-version concurrency control (MVCC) scheme. DuckDB is also optimized for bulk operations, so executing many small transactions is not a primary design goal. To ensure ACID compliance, DuckDB implements HyPer's serializable variant of MVCC that is tailored specifically for hybrid OLAP/OLTP systems. This variant updates data in-place immediately and keeps previous states stored in a separate undo buffer for concurrent transactions and aborts. Additionally, because queries in DuckDB might run for considerable time, lock-free MVCC is used to provide multiple consistent views on the same dataset without blocking query progress.
DuckDB supports multiple writer threads using a combination of MVCC and optimistic concurrency control when handling concurrency within a single process. If two threads attempt to edit the same row of data simultaneously, the second thread will fail due to optimistic concurrency control. However, DuckDB does not support handling multiple processes write, and it requires the design pattern to be implemented in the application logic, such as using cross-process mutex lock, or doing multiprocess transactions on a Postgres or SQLite database and using DuckDB's Postgres scanner or SQLite scanner to execute analytical queries periodically on that data.
Adaptive Radix Tree (ART) Block Range Index (BRIN)
DuckDB supports two types of indexes: min-max index and Adaptive Radix Tree (ART). The min-max index is automatically created for columns of general-purpose data types, while the ART index is used to enforce primary key constraints and to speed up point and highly selective queries. The ART index is automatically created for columns with a UNIQUE or PRIMARY KEY constraint and can be defined using CREATE INDEX. Both index types are persisted on disk in the latest version. Joins on columns with an ART index can use the index join algorithm, and forcing index joins is possible using pragmas. It's worth noting that ART indexes must currently fit in memory. Overall, ART indexes are used to keep primary key (PK), foreign key (FK), and unique constraints, as well as to speed up filters, point-queries, range queries (with high selectivity), and joins.
DuckDB provides snapshot isolation level for transactions, which means that a transaction sees a snapshot of the database at the start of the transaction, and any changes made by other transactions after the start of the transaction are not visible to the transaction. This provides a higher degree of isolation than other isolation levels like read committed or repeatable read, which allow for dirty reads or non-repeatable reads respectively.
Hash Join Sort-Merge Join Index Nested Loop Join
DuckDB supports several join algorithms, including hash join, sort-merge join, and index join. The system also implements join ordering optimization using dynamic programming and a greedy fallback for complex join graphs. It performs flattening of arbitrary subqueries and has a set of rewrite rules to simplify the expression tree, including common subexpression elimination and constant folding. The result is an optimized logical plan for the query, which is then transformed by the physical planner into the physical plan, selecting suitable implementations where applicable. In addition, DuckDB introduced a join algorithm called range join, which uses the interval encoding join (IEJoin) technique to handle range queries efficiently. IEJoin leverages min-max indexes to reduce the amount of data that needs to be scanned and uses SIMD instructions to further improve performance. Finally, DuckDB also supports an out-of-core hash-join for large tables that cannot fit in memory.
DuckDB supports data durability through the use of a write-ahead log (WAL) that is persisted on disk. The WAL records all changes made to the database, including inserts, updates, and deletes, and ensures that they are durable in the event of a crash or system failure. The WAL uses a physical logging scheme, which means that it logs changes to the physical pages on disk rather than to logical transactions. In the latest version, the WAL no longer stores queries for updates/deletes but instead stores row IDs of affected tuples, which can reduce the amount of disk space needed to store the log.
Intra-Operator (Horizontal) Inter-Operator (Vertical)
DuckDB supports both inter-operator (vertical) and intra-operator (horizontal) parallelism. For intra-operator parallelism, DuckDB supports parallel writing for Parquet and CSV write, parallel data loading into individual tables, and parallel execution of operations such as CREATE INDEX, COUNT(DISTINCT), and GROUPBY. DuckDB also supports query pipelining, where operators in a query are overlapped to reduce overall query execution time. Additionally, the system uses thread pooling to manage parallelism in a more efficient manner, by avoiding the overhead of creating and destroying threads for each parallel operation.
DuckDB's parallel execution capabilities allow it to process queries much faster than a single-threaded system, especially for queries involving large data sets or complex calculations. However, it's important to note that not all operations can be parallelized, and the amount of speedup achieved will depend on the specific query and data set being used.
DuckDB does not support Just-in-Time (JIT) compilation of SQL queries, but instead chooses to use a vectorized model. This is because JIT engines require large compiler libraries such as LLVM, which can create additional transitive dependencies and make the deployment and integration process more complex. DuckDB aims to simplify this process by avoiding external dependencies and using templates for code generation.
DuckDB uses a vectorized interpreted execution engine that supports fixed-length types like integers as native arrays and variable-length values like strings as a native array of pointers into a separate string heap. To represent NULL values, DuckDB uses a separate bit vector that is only present if NULL values appear in the vector. DuckDB supports a vectorized pull-based model that is also known as "vector volcano". In this model, query execution starts by pulling the first chunk of data from the root node of the physical plan. A chunk is a horizontal subset of a result set, query intermediate, or base table. This node will recursively pull chunks from child nodes, eventually arriving at a scan operator that produces chunks by reading from the persistent tables. The process continues until the chunk arriving at the root is empty, indicating that the query has been completed.
DuckDB uses a vectorized processing model that processes batches of columns at a time, which is optimized for CPU cache locality, suitable for SIMD instructions and pipelining, and has small intermediates that ideally fit in L1 cache. The system contains an extensive library of vector operations that support the relational operators, and this library expands code for all supported data types using C++ code templates.
DuckDB offers various client APIs for data loading and executing queries. DuckDB is an embedded database, which means it does not have a client protocol interface or a server process. Instead, its "native" API is implemented in C++, which allows for high performance and efficient data processing. In addition to the C++ API, there are official wrappers available for popular programming languages, including C, Python, R, Java, Node.js, WebAssembly/Wasm, ODBC API, Julia, and a Command Line Interface (CLI). Additionally, DuckDB has contributed third-party wrappers available for Ruby, Go, C#, Rust, and Common Lisp, providing a broader range of language support for developers. DuckDB also provides a SQLite compatibility layer that allows applications previously using SQLite to use DuckDB through re-linking or library overloading.
DuckDB uses a hybrid storage architecture that allows it to operate both as a disk-oriented and in-memory database management system. It uses a single-file storage format to store data on disk, which provides flexibility and ease of use. This format is designed to be efficient and provides excellent performance for both small and large datasets. For example, on the engine level, DuckDB provides the option to compress temporary structures like hash tables in memory using different compression algorithms. This further enhances performance and memory efficiency, particularly for complex queries and data manipulation. DuckDB's hybrid storage architecture makes it suitable for use in a variety of applications, ranging from OLTP to OLAP workloads, analytical applications, and more.
DuckDB natively supports a single-file storage format, which is partitioned into fixed-size blocks of 256KB. The first block contains a header that points to the table catalog and a list of free blocks, and the catalog contains pointers to lists of schemas, tables, and views. Each table consists of a number of blocks that hold the data, and checkpoints write new blocks containing updated data to the file and update the root pointer and free list atomically.
In addition, DuckDB supports Apache Arrow as a data interchange format, allowing for easy data import and export with other Arrow-enabled systems. DuckDB leverages Arrow's vectorized data processing capabilities to accelerate query processing, taking advantage of the cache-efficient memory layout and SIMD instructions provided by Arrow.
DuckDB also includes a zero-dependency Parquet reader that can directly execute SQL queries on Parquet files without any import or analysis step. The system reads Parquet files in a streaming fashion and automatically detects which columns and rows are required for any given query, enabling analysis of larger and more complex Parquet files without manual optimizations or additional hardware. Additionally, DuckDB can run queries directly on top of Parquet files without requiring an initial loading phase, and it takes advantage of all of Parquet's advanced features to speed up query execution.
Decomposition Storage Model (Columnar)
DuckDB uses a decomposition storage model, specifically a columnar layout for persistent storage. Logical tables are horizontally partitioned into chunks of columns, and these chunks are compressed into physical blocks using lightweight compression methods. The physical blocks are organized into a read-optimized DataBlocks storage layout, where each block contains min/max indexes for every column. This allows DuckDB to determine if a block is relevant to a query. Additionally, each block also carries a lightweight index for every column, which further restricts the amount of values that need to be scanned during query processing.
DuckDB is an embedded analytical system. It enables local storage and querying of large amounts of data while providing transactional guarantees. At the same time, DuckDB is seamlessly integrated with analysis tools for convenient data access and eliminates data transfer bottlenecks. The system is optimized for OLAP workloads while still maintaining OLTP functionality.
https://github.com/duckdb/duckdb
CWI, DuckDB Labs
2018