Kylin is an open source distributed data analytics engine on top of SparkSQL/Hive. It offers SQL interface to do OLAP on large datasets. Kylin is not a replacement for a massive parallel processing engine like Hive and Presto. It runs on top of these systems as a query accelerator. The way Kylin works is that it pre-calculates a set of data cubes, stores them in HBase, and directly looks up the results in them when receiving queries. If a query cannot be answered by the data cubes, it will be executed by the underlying process engine (like Hive). The data cubes are built when the dataset is imported. The user is responsible to specify which data cubes should be built.
Kylin applies dictionary encoding to all dimension values in data cubes. Kylin's dictionary is order-preserving and supports mapping both from keys to values and vice versa. The dictionary is implemented as a radix tree. Each node in the radix tree also contains the size of its subtree to support mapping values back to keys. Kylin also supports naive compression algorithms in HBase and Hive.
Data cubes are stored as HBase tables. Given a dimension column set, Kylin pre-aggregates all possible combinations of their attributes by map-reduce jobs, then encode the dimensions with dictionary encoding. Finally, Kylin encodes all data cubes to `Rowkey`s in HBase. The format of a `Rowkey` is `cuboid id + attribute`. For example, assume a data cube on `year` and `city` with cuboid id `00000001`, and there is a row `year=1994, city=Beijing, sum(sales)=100`, and a dictionary maps `1994=0, Beijing=1`, there will be an entry in the HBase table `Rowkey=00000001+01, value=100`.
Decomposition Storage Model (Columnar)
Kylin stores its data in HBase, which is a column-family system.
https://github.com/apache/kylin
KylinOLAP
eBay
2013
Open Source