LucidDB

View Current Viewing Revision #11 from 01/04/2022 9:42 a.m.

LucidDB is a DBMS optimized for business intelligence. Its architecture supports column-store, bitmap indexing, hash join/aggregation, and page-level multi-versioning. LucidDB is designed to achieve flexible, high-performance data integration and sophisticated query processing, making it suitable for data warehousing and OLAP servers. It is part of the Eigenbase project.

History

Due to the lack of sponsors and community activity, codebase and web pages of LucidDB are no longer being maintained starting from 2014.

Some portions of LucidDB live on as separate projects. For example, Apache Calcite framework was forked from the LucidDB codebase.

Joins

Hash Join Semi Join

Hash Join is one of the most efficient join implementations. In LucidDB, the same Hash Table used by Hash Join is also used for duplicate removal and aggregation over a single input. LucidDB optimizes star joins by utilizing semijoins, avoiding to read fact table rows which are not needed by a query.

Storage Architecture

Disk-oriented

Data in LucidDB is stored as pages, which are allocated from an operating system file in physical storage. Pages can be randomly accessed by a unique block ID. This block ID is mapped to an offset in the system file. If a page will be accessed, it should be first read into a buffer pool. The number of pages in the buffer pool can be dynamically set, with a default value of 5000 pages (about 160MB). Buffer pool uses hash buckets to identify pages.

Compression

Bitmap Encoding Bit Packing / Mostly Encoding

Each unique column value is stored in LucidDB as a bit-encoded vector. Each column value can then be represented as a bit-encoded vector. Compression is achieved by not directly storing every column value.
LucidDB uses the bitmap index to represent column Indexes. A rid is mapped to a bit in the bitmap. A set bit means that the corresponding key value must contain the related rid. There are a series of bytes in that bitmap. In order to do compression, LucidDB strips off those bytes who does not have any set bits.

Concurrency Control

Multi-version Concurrency Control (MVCC)

LucidDB implements page-lever concurrency control. When a DML statement modifies an existing page, it creates a new version rather than updating the original page. The old page version is not deleted immediately because some concurrent reading to the old page may still be running. A LucidDB-specific command is available to reclaim any old page versions which are no longer needed.

Foreign Keys

Not Supported

Foreign keys are not supported.

Indexes

B+Tree BitMap

LucidDB can automatically shift between bitmap and Btree representation depending on data distribution. It is possible that both index types may coexist for different parts of the same table. This is called intelligent indexing in LucidDB, which does not require a DBA to manually decide index type.

Storage Model

Decomposition Storage Model (Columnar)

Column store is able to reduce I/O for OLAP workloads because a typical type kind of query in OLAP involves large scans on a subset of attributes.

Storage Organization

Copy-on-Write / Shadow Paging

The original version of a page is known as an anchor page. An operation that modifies a page will result in the page being versioned The most recent version of a page is always chained directly from the anchor. Older pages follow it in the chain, in decreasing timestamp order. The last page points back to the anchor, making a loop. Each page saves timestamp in metadata entries.

Revision #11 | Updated 01/04/2022 9:42 a.m.