The initial version of Hyrise -- presented in PVLDB 2011 -- focussed on optimizing the table layout in order to optimize CPU caching for a given workload. It used a workload-aware highly flexible partitioning approach to cluster data. This flexible data layout came with a high number of virtual function calls/several indirections for data accesses. Due to the grown code base and a shifted research focus, a new version of Hyrise has been built from scratch starting in 2016. The new version sets its focus more on topics like NUMA support (cf. chunked partitioning), NVRAM, SQL optimization (cf. query optimizer), and Self-Driving.
Decomposition Storage Model (Columnar)
Hyrise started as one of the first databases with a hybrid memory layout. With the rewrite, only columnar storage is implemented. Hybrid layouts are planned to come back, but are not a high priority.
Views are stored as logical query plans (LQPs). For a query selecting from a view, the LQP is inserted at the position at which usually a stored table would appear before the optimizer is called. This makes it possible for the optimizer to optimize across view boundaries - for example pushing down additional predicates into the view.
The SQLPipeline is an interface that takes an SQL string and returns the result table(s). Hyrise uses its own SQL-Parser to translate SQL queries into an Abstract Syntax Tree (AST). These are converted into a logical query plan (LQP), which is optimized, translated into a physical query plan (PQP) and finally executed. The pipeline handles both query plan caching and stored procedures.
Just-In-Time compilation is currently being developed; a first version is available and has to be specifically enabled. JIT operators are implemented in C++ code and can be executed as regular operators without a JIT compilation step. Code specialization (using LLVM) is used to inline function calls, replace constants, and perform other optimizations. The JIT compilation fuses multiple operators into one tight loop.
Multi-version Concurrency Control (MVCC)
For each row, three pieces of information are stored: The commit id of the transaction that successfully inserted the row (begin cid), that of the transaction that fully deleted it (end cid), and the transaction id of the transaction that currently modifies it. The linked Wiki page describes how these are used to calculate the row's visibility.
Dictionary Encoding Delta Encoding Run-Length Encoding Bit Packing / Mostly Encoding
Hyrise includes a compression framework based on C++ iterators and zero-cost abstractions. Such zero-cost abstractions avoid the runtime overheads of dynamic dispatching for increased compile times.
https://hpi.de/plattner/projects/hyrise.html
https://github.com/hyrise/hyrise
https://github.com/hyrise/hyrise/wiki
Hasso Plattner Institute
2009