Apache Kudu is an open source storage engine for structured data that is part of the Apache Hadoop ecosystem. Apache Kudu is designed and optimized for big data analytics on rapidly changing data. It is designed for fast performance on OLAP queries. Built for distributed workloads, Apache Kudu allows for various types of partitioning of data across multiple servers. Also, being a part of the Hadoop ecosystem, Kudu can be integrated with data processing frameworks like Spark, Impala and MapReduce.
Prior to Kudu, most data storage engines were able to store one type of structured data, either static or mutable. Storage engines for static data were unable to make changes to individual records while storage engines for mutable data had a low throughput for sequential reads. Hence, developers typically used one storage engine for mutating data and another one for performing analytics. Apache Kudu was designed to support operations on both static and mutable data types, providing high throughput on both sequential-access and random-access queries. Kudu was developed as an internal project at Cloudera and became an open source project in 2016.
Each column in a Kudu table can be encoded in different ways based on the column type. By default, bit packing is used for int, double and float column types, run-length encoding is used for bool column types and dictionary-encoding for string and binary column types. By default Kudu doesn't compress columns but it supports per-column compression using LZ4, Snappy or zlib compression codecs.
Kudu employs MVCC. Kudu uses an optimistic concurrency model in which readers don't block writers and writes don't block readers. As a result, less lock acquisitions are needed during large table scans.
Kudu is a relational database. Unlike traditional relation databases, Kudu utilizes partitioning to split the data into tablets that are stored on individual servers. All rows within a tablet are ordered by a primary key.
Kudu allows the user to set two different read operation modes through the Kudu API. By default, Kudu uses the 'Read Committed' isolation level. For the 'Snapshot Isolation' level the timestamp can either be set explicitly by the user or assigned by the server. The 'Snapshot Isolation' read level allows for consistent and repeatable reads.
Joins on Kudu are not supported. However, when used with an Impala query engine, Kudu tables can be joined with other tables stored in the Hadoop storage system.
Kudu employs a logical log of operations. The operation logs are replicated for each tablet (partitions of the database). An operation log for each tablet is stored separately from the physical data. Operation logs are not able to be replayed or shipped. For single row operations (on a single tablet) Kudu is ACID. However for multi-row operations (on a single tablet), the Atomic property is not fully preserved since a single failed write will not rollback the entire operation, so per-row errors are possible. Multi-tablet operations are currently not possible with Kudu.
Kudu uses JIT compilation (using LLVM) for record projection operations specifically. The reason for this optimization is that new records are initially stored in MemRowSets, which are in-memory row-stored units of a tablet. To compensate for the lower performance of the row-stored nature of these units, JIT compilation is used for projection operations that may touch these new records.
Kudu has No-SQL client APIs for C++, Java and Python. Kudu can also be used with SQL-based query processing interfaces like Hadoop's Impact, MapReduce and Spark.
Because Kudu is designed primarily for OLAP queries, a Decomposition Storage Model is used. Kudu's columnar data storage model allows it to avoid unnecessarily reading entire rows for analytical queries. The strong-type requirement for the columns also allows for single-data-type compression.
Kudu is designed for distributed workloads so it follows a shared-nothing architecture. The data is horizontally partitioned into tablets (so an entire row is in the same tablet). Each tablet is replicated (typically into 3 or 5 replicas) and each of these replicas is stored in its own tablet server. Each tablet server has multiple unique tablets. For each tablet, one of the servers at which the tablet is stored is the leader and the rest of the servers are the followers. Leaders are elected using Raft consensus. Reads for a specific tablet can be done from any one of the tablet servers that store that tablet. Writes are only sent to the leader tablet server and whether the operation is accepted or not is determined by Raft consensus.
C++, Java, Python, SQL