Kylin

Kylin is an open source distributed data analytics engine on top of Hadoop/Spark. It offers SQL interface to do OLAP on large datasets. Unlike massive parallel processing engines like Hive and Presto, Kylin pre-calculates a set of data cubes, stores them in HBase, and directly looks up the results in them. If a query cannot be answered by the data cubes, it will be executed by the underlying process engine. In this way, Kylin is usually used as an accelerator of traditional parallel data processing engines.

History

The Kylin project was started in 2013, from eBay's R&D in Shanghai, China. It was open sourced on Github as "KylinOLAP" in Oct 2014. In Nov 2015, Kylin joined Apache Software Foundation incubator; In Dec 2015, Apache Kylin became a Top Level Project.

Query Interface

SQL

Kylin is a pure OLAP engine, so it only supports `SELECT` queries. `INSERT`, `UPDATE` and `DELETE` are not supported.

Joins

Hash Join Sort-Merge Join

On cube building phase, Kylin use Hive to pre-join the fact table and lookup tables. On query time, table joins are supported by the Apache Calcite query engine.

Storage Model

Custom

The data cubes are stored as Key-Value pairs.

Storage Architecture

Disk-oriented

Foreign Keys

Supported

Kylin supports star schema. A user needs to specify fact tables and lookup tables before building cubes. Kylin pre-joins the tables when building data cubes.

Data Model

Key/Value

Data cubes are essentially HBase tables. Given a dimension column set, Kylin pre-aggregates all possible combinations of their attributes by map-reduce jobs, then encode the dimensions with dictionary encoding. Finally, Kylin encodes all data cubes to `Rowkey`s in HBase. The format of a `Rowkey` is `cuboid id + attribute`. For example, assume a data cube on `year` and `city` with cuboid id `00000001`, and there is a row `year=1994, city=Beijing, sum(sales)=100`, and a dictionary maps `1994=0, Beijing=1`, there will be an entry in the HBase table `Rowkey=00000001+01, value=100`.

System Architecture

Shared-Disk

Kylin relies on Hive to store raw tables and HBase to store data cubes, both of which store data on HDFS.

Query Compilation

Code Generation

The Apache Calcite query engine does code generation for SQL queries.

Compression

Dictionary Encoding

Kylin applies dictionary encoding to all dimension values in data cubes. Kylin's dictionary is order-preserving and supports mapping both from keys to values and vice versa. The dictionary is implemented as a radix tree. Each node in the radix tree also contains the size of its subtree to support mapping values back to keys. Besides, Kylin also supports naive compression algorithms in HBase and Hive.

Kylin Logo
Website

http://www.kylin.io/

Source Code

https://github.com/apache/kylin

Tech Docs

http://kylin.apache.org/docs/

Former Name

KylinOLAP

Developer

eBay

Country of Origin

CN

Start Year

2013

Project Type

Open Source

Written in

Java

Licenses

Apache v2

Wikipedia

https://en.wikipedia.org/wiki/Apache_Kylin