Apache Kudu is an open source storage engine for structured data that is part of the Apache Hadoop ecosystem. The primary intention of Kudu is to allow applications to perform fast big data analytics on rapidly changing data. It was designed for fast performance for OLAP queries. Being a part of the Hadoop ecosystem, Kudu supports the use of Apache data processing frameworks like Spark, Impala or MapReduce on its tables. Kudu tables can also be joined with other Hadoop storage engines like HBase and HDFS. To build a Kudu application developers can use the Java, C++ or Python Kudu APIs that support No-SQL style accesses or SQL style frameworks like Apache Impala.
Prior to Kudu, most data storage engines were able to store one type of structured data, static or mutable. Storage engines for static data were unable to make changes to individual records while storage engines for mutable data had a low throughput for sequential reads. Because of this developers typically used two different storage engines for first mutating their data and then performing analytics. Apache Kudu was designed to support both data formats and provide both high throughput sequential-access and random-access queries. Kudu was developed as internal project at Cloudera and become open to the public in September 2016.
Dictionary Encoding Run-Length Encoding Bit Packing / Mostly Encoding Prefix Compression
Each column in a Kudu table can be encoded in certain ways based on the type of that column. By default, bit packing is used for various int, double and float column types, run-length encoding is used for bool column types and dictionary-encoding for string or binary column types. By default Kudu doesn't compress columns but it supports per-column compression using LZ4, Snappy or zlib compression codecs.
Multi-version Concurrency Control (MVCC)
Kudu employs MVCC. Kudu uses an optimistic concurrency model in which readers don't block writers and writes don't block readers. As a result less lock acquisitions are needed during large table scans.
https://github.com/apache/kudu
Cloudera
2016