Umbra

Viewing Revision #17 from 2023-04-02 17:43 View Current

Umbra is a relational DBMS designed to support high performance for OLAP and OLTP workloads using flash-based storage. Umbra provides the performance of a pure main-memory DBMS for workloads that fit within main memory, with the scalability of a disk-based system. Umbra's buffer manager is based on LeanStore and supports variable-sized pages, enabling data structures to be accessed directly. Umbra integrates Worst-Case Optimal Joins (WCOJ) into the query optimizer, allowing WCOJ to be used for sub-plans of a query, improving performance for queries with large intermediate results. Umbra extends the query compilation approach from Hyper with a low-latency backend, Flying Start, which emits x86 machine code in a single pass. Umbra also supports User-Defined Operators (UDOs), which extend the DBMS functionality to support custom algorithms written in their language of choice.[01]

Logo Versions

Website: https://umbra-db.com[01]
Developer: Technical University of Munich
Country of Origin: DE
Start Year: 2018 [05]
Project Type: Academic
Written in: C++
Derived From: HyPer
Embeds / Uses: LeanStore
Compatible With: PostgreSQL
Operating System: Linux
License: Proprietary

Logo Versions

Website: https://umbra-db.com[01]
Developer: Technical University of Munich
Country of Origin: DE
Start Year: 2018 [05]
Project Type: Academic
Written in: C++
Derived From: HyPer
Embeds / Uses: LeanStore
Compatible With: PostgreSQL
Operating System: Linux
License: Proprietary

Derivative Systems

CedarDB

Umbra

Viewing Revision #17 from 2023-04-02 17:43 View Current

OLAP OLTP

History

Umbra is the new system built at TUM after the HyPer project.

Concurrency Control

Multi-version Concurrency Control (MVCC)

Data Model

Relational

Foreign Keys

Supported

Indexes

B+Tree

Isolation Levels

Serializable

Joins[02]

Nested Loop Join Hash Join Semi Join Index Nested Loop Join Worst-Case Optimal Join

Umbra executes queries using traditional binary joins such as hash-joins and nested-loop joins. Additionally, Umbra has integrated Worst-Case Optimal Joins (WCOJ) into the query optimizer and execution engine. Worst-case optimal joins provide superior performance to binary joins when the cardinality of intermediate joins is large. Therefore, during query optimization, Umbra detects when a portion of the query plan would result in large intermediate results and use a WCOJ instead. To execute a WCOJ, Umbra builds hash-trie indexes on the involved relations and performs a multi-way join using these indexes.

Query Compilation[03][04]

JIT Compilation

Umbra performs Just-In-Time (JIT) compilation of queries into Umbra IR, a custom intermediate representation (IR) similar to LLVM IR but optimized for use in a database system. After generating Umbra IR, the code is lowered using one of two backends:

LLVM
Flying Start

The LLVM backend emits LLVM IR, compiled at optimization level -O3. This backend is the slowest but generates the fastest executing code, making it suitable for long-running queries.

The Flying Start backend emits x86 machine code using asmJIT, generating x86 in a single pass. In addition, the Flying Start backend implements Stack Space Reuse, Machine Register Allocation, Lazy Address Calculation, and Comparison-Branch Fusion optimizations. As a result, the code generated by Flying Start has performance on par with code generated by LLVM -O0 (i.e., with optimizations disabled). Additionally, Flying Start outperforms interpretation of Umbra IR, making Flying Start suitable for all but the longest-running queries.

Umbra supports adaptive execution, pioneered by HyPer, allowing the DBMS to switch execution strategies while processing a single query. Umbra first generates x86 machine code using the Flying Start backend and then switches to the code generated by the LLVM backend for long-running queries.