Umbra is a relational DBMS designed to support high performance for OLAP and OLTP workloads using flash-based storage. Umbra provides the performance of a pure main-memory DBMS for workloads that fit within main memory, with the scalability of a disk-based system. Umbra's buffer manager is based on LeanStore and supports variable-sized pages, enabling data structures to be accessed directly. Umbra integrates Worst-Case Optimal Joins (WCOJ) into the query optimizer, allowing WCOJ to be used for sub-plans of a query, improving performance for queries with large intermediate results. Umbra extends the query compilation approach from Hyper with a low-latency backend, Flying Start, which emits x86 machine code in a single pass. Umbra also supports User-Defined Operators (UDOs), which extend the DBMS functionality to support custom algorithms written in their language of choice.
Nested Loop Join Hash Join Semi Join Index Nested Loop Join Worst-Case Optimal Join
Umbra executes queries using traditional binary joins such as hash-joins and nested-loop joins. Additionally, Umbra has integrated Worst-Case Optimal Joins (WCOJ) into the query optimizer and execution engine. Worst-case optimal joins provide superior performance to binary joins when the cardinality of intermediate joins is large. Therefore, during query optimization, Umbra detects when a portion of the query plan would result in large intermediate results and use a WCOJ instead. To execute a WCOJ, Umbra builds hash-trie indexes on the involved relations and performs a multi-way join using these indexes.
Umbra performs Just-In-Time (JIT) compilation of queries into Umbra IR, a custom intermediate representation (IR) similar to LLVM IR but optimized for use in a database system. After generating Umbra IR, the code is lowered using one of two backends:
The LLVM backend emits LLVM IR, compiled at optimization level -O3. This backend is the slowest but generates the fastest executing code, making it suitable for long-running queries.
The Flying Start backend emits x86 machine code using asmJIT, generating x86 in a single pass. In addition, the Flying Start backend implements Stack Space Reuse, Machine Register Allocation, Lazy Address Calculation, and Comparison-Branch Fusion optimizations. As a result, the code generated by Flying Start has performance on par with code generated by LLVM -O0 (i.e., with optimizations disabled). Additionally, Flying Start outperforms interpretation of Umbra IR, making Flying Start suitable for all but the longest-running queries.
Umbra supports adaptive execution, pioneered by HyPer, allowing the DBMS to switch execution strategies while processing a single query. Umbra first generates x86 machine code using the Flying Start backend and then switches to the code generated by the LLVM backend for long-running queries.