Kudu

Apache Kudu is an open source storage engine for structured data that is part of the Apache Hadoop ecosystem. The primary intention of Kudu is to allow applications to perform fast big data analytics on rapidly changing data. It was designed for fast performance for OLAP queries but also supports OLTP. Being a part of the Hadoop ecosystem, Kudu supports the use of Apache data processing frameworks like Spark, Impala or MapReduce on its tables. Kudu tables can also be joined with other Hadoop storage engines like HBase and HDFS. To build a Kudu application developers can use the Java, C++ or Python Kudu APIs that support No-SQL style accesses or SQL style frameworks like Apache Impala.

History

Prior to Kudu, most data storage engines were able to store one type of structured data, static or mutable. Storage engines for static data were unable to make changes to individual records while storage engines for mutable data had a low throughput for sequential reads. Because of this developers typically used two different storage engines for first mutating their data and then performing analytics. Apache Kudu was designed to support both data formats and provide both high throughput sequential-access and random-access queries.

Storage Architecture

In-Memory

Data Model

Relational

Kudu Logo
Website

https://kudu.apache.org/

Source Code

https://github.com/apache/kudu

Tech Docs

https://kudu.apache.org/docs/

Developer

Cloudera

Country of Origin

US

Start Year

2016

Project Type

Open Source

Supported languages

C++

Derived From

HBase

Compatible With

Spark SQL

Operating Systems

Linux

Licenses

Apache v2

Wikipedia

https://en.wikipedia.org/wiki/Apache_Kudu