LucidDB

LucidDB is a DBMS optimized for business intelligence. Its architecture supports column-store, bitmap indexing, hash join/aggregation, and page-level multi-versioning. LucidDB is designed to achieve flexible, high-performance data integration and sophisticated query processing, making it suitable for data warehousing and OLAP servers. It is part of the Eigenbase project.

History

LucidDB had been the first pure-play open source column store database. Due to the lack of sponsors and community activity, codebase and web pages of LucidDB are no longer being maintained starting from 2014.

Concurrency Control

Multi-version Concurrency Control (MVCC)

LucidDB implements page-lever concurrency control. When a DML statement modifies an existing page, it creates a new version rather than updating in place. The old page version is not deleted immediately because some concurrent reading to the old page may still be running. A LucidDB-specific command is available to reclaim any old page versions which are no longer needed.

Joins

Hash Join Semi Join

Hash Join is one of the most efficient join implementations. In LucidDB, the same Hash Table support used by Hash Join is also used for duplicate removal and aggregation over a single input. LucidDB optimizes star joins by utilizing semijoins, avoiding to read fact table rows which are not needed by a query.

Storage Architecture

Disk-oriented

Data in LucidDB is stored as pages, which are allocated from an operating system file. Pages can be randomly accessed by a unique block ID. This block ID, in turn, maps to an offset within the physical file. If a page will be accessed, it should be first read into a buffer pool. The number of pages in the buffer pool is configurable and defaults to 5000 pages (~160MB). Hash buckets are used to efficiently locate pages in the buffer pool.

Compression

Bitmap Encoding Bit Packing / Mostly Encoding

LucidDB associates with each column value a bit-encoded vector, instead of storing each column value for every rid value on a page. In addition to efficient accessing these bit encodings, at the expense of using slightly more space, LucidDB encodes the bits using no more than two vectors that contain either 1, 2, 4, 8, or 16 bits. Indexes created on column store tables are bitmap indexes. A bitmap consists of a series of bytes, where each bit within a byte corresponds to a rid. If the bit is set, then that indicates that the corresponding key value contains that rid. Compression is achieved by stripping off bytes that contain no set bits.

Indexes

B+Tree BitMap

LucidDB can automatically shift between bitmap and Btree representation depending on data distribution, even using both in the same index for different portions of the same table. It leads to optimal data compression, reduced I/O, and fast evaluation of boolean expressions.

Storage Model

Decomposition Storage Model (Columnar)

By reducing I/O, column store is ideal for OLAP workloads which have read-only queries involving large scans on a subset of attributes.

LucidDB Logo