Kudu

View Current Viewing Revision #2 from 12/12/2018 4:26 p.m.

Apache Kudu is an open source storage engine for structured data that is part of the Apache Hadoop ecosystem. The primary intention of Kudu is to allow applications to perform fast big data analytics on rapidly changing data. It was designed for fast performance for OLAP queries but also supports OLTP. Being a part of the Hadoop ecosystem, Kudu supports the use of Apache data processing frameworks like Spark, Impala or MapReduce on its tables. Kudu tables can also be joined with other Hadoop storage engines like HBase and HDFS. To build a Kudu application developers can use the Java, C++ or Python Kudu APIs that support No-SQL style accesses or SQL style frameworks like Apache Impala.

History

Prior to Kudu, most data storage engines were able to store one type of structured data, static or mutable. Storage engines for static data were unable to make changes to individual records while storage engines for mutable data had a low throughput for sequential reads. Because of this developers typically used two different storage engines for first mutating their data and then performing analytics. Apache Kudu was designed to support both data formats and provide both high throughput sequential-access and random-access queries.