Kylin is an open source distributed data analytics engine on top of Hadoop/Spark. It offers SQL interface to do OLAP on large datasets.
Unlike massive parallel processing engines like Hive and Presto, Kylin pre-calculates a set of data cubes, stores them in HBase, and directly looks up the results in them. If a query cannot be answered by the data cubes, it will be executed by the underlying process engine. In this way, Kylin is usually used as an accelerator of traditional parallel data processing engines.
Kylin is a pure OLAP engine, so it only supports SELECT
queries. INSERT
, UPDATE
and DELETE
are not supported.
Data cubes are essentially HBase tables. Given a dimension column set, Kylin pre-aggregates all possible combinations of their attributes by map-reduce jobs, then encode the dimensions with dictionary encoding. Finally, Kylin encodes all data cubes to Rowkey
s in HBase. The format of a Rowkey
is cuboid id + attribute
. For example, assume a data cube on year
and city
with cuboid id 00000001
, and there is a row year=1994, city=Beijing, sum(sales)=100
, and a dictionary maps 1994=0, Beijing=1
, there will be an entry in the HBase table Rowkey=00000001+01, value=100
.
https://github.com/apache/kylin
eBay
2013
KylinOLAP