LucidDB

LucidDB is a DBMS optimized for business intelligence. Its architecture supports column-store, bitmap indexing, hash join/aggregation, and page-level multi-versioning. LucidDB is designed to achieve flexible, high-performance data integration and sophisticated query processing, making it suitable for data warehousing and OLAP servers. It is part of the Eigenbase project.

History

LucidDB had been the first pure-play open source column store database. Due to the lack of sponsors and community activity, codebase and web pages of LucidDB are no longer being maintained starting from 2014.

Compression

Bitmap Encoding Bit Packing / Mostly Encoding

LucidDB associates with each column value a bit-encoded vector, instead of storing each column value for every rid value on a page. In addition to efficient accessing these bit encodings, at the expense of using slightly more space, LucidDB encodes the bits using no more than two vectors that contain either 1, 2, 4, 8, or 16 bits. Indexes created on column store tables are bitmap indexes. A bitmap consists of a series of bytes, where each bit within a byte corresponds to a rid. If the bit is set, then that indicates that the corresponding key value contains that rid. Compression is achieved by stripping off bytes that contain no set bits.

Concurrency Control

Multi-version Concurrency Control (MVCC)

LucidDB implements page-lever concurrency control. When a DML statement modifies an existing page, it creates a new version rather than updating in place. The old page version is not deleted immediately because some concurrent reading to the old page may still be running. A LucidDB-specific command is available to reclaim any old page versions which are no longer needed.

Joins

Hash Join Semi Join

Hash Join is one of the most efficient join implementations. In LucidDB, the same Hash Table support used by Hash Join is also used for duplicate removal and aggregation over a single input. LucidDB optimizes star joins by utilizing semijoins, avoiding to read fact table rows which are not needed by a query.

Data Model

Column Family / Wide-Column

By reducing I/O, column store is ideal for OLAP workloads which have read-only queries involving large scans on a subset of attributes.

Indexes

B+Tree BitMap

LucidDB can automatically shift between bitmap and Btree representation depending on data distribution, even using both in the same index for different portions of the same table. It leads to optimal data compression, reduced I/O, and fast evaluation of boolean expressions.

LucidDB Logo