Apache Kudu is an open source storage engine for structured data that is part of the Apache Hadoop ecosystem. The primary intention of Kudu is to allow applications to perform fast big data analytics on rapidly changing data. It was designed for fast performance for OLAP queries. Built for distributed workloads, Apache Kudu allows for various partitioning of data across multiple servers. Also, being a part of the Hadoop ecosystem, Kudu is supports the use of Apache data processing frameworks like Spark, Impala or MapReduce on its tables.
Prior to Kudu, most data storage engines were able to store one type of structured data, static or mutable. Storage engines for static data were unable to make changes to individual records while storage engines for mutable data had a low throughput for sequential reads. Because of this developers typically used two different storage engines for first mutating their data and then performing analytics. Apache Kudu was designed to support both data formats and provide both high throughput sequential-access and random-access queries. Kudu was developed as internal project at Cloudera and become open to the public in September 2016.
Vectorized Model Materialized Model
Kudu executes both reads and writes on batches of rows which are actually columnar in-memory (as they are on disk). If there are predicates in the query, lazy materialization is used.
Kudu employs a logical log of operations (insert/update/delete). The operation logs are replicated for each tablet (partitions of the database). An operation log for each tablet is stored separately from the physical data. Operation logs are not able to be replayed or shipped. For single row operations (on a single tablet) Kudu is ACID. However for multi-row operations (on a single tablet), the Atomic property is not fully preserved since a single failed write will not rollback the entire operation, so per-row errors are possible. Multi-tablet operation are currently no possible with Kudu.
Multi-version Concurrency Control (MVCC)
Kudu employs MVCC. Kudu uses an optimistic concurrency model in which readers don't block writers and writes don't block readers. As a result less lock acquisitions are needed during large table scans.
Kudu is designed for distributed work tasks as a result the system architecture is a shared-nothing master-slave architecture. The database is horizontally partitioned (so each row is in the same partition) into tablets. Each tablet is replicated (typically 3 or 5 times) and each of these instance is stored on its own tablet server. Each tablet server has multiple unique tablets. For each tablet, one of the servers at which the tablet is stored is the leader and the rest of the servers are the followers. Leaders are elected using Raft consensus. Reads for a specific tablet can be done from any one of the tablet servers that store that tablet. Writes are only sent to the leader tablet server and whether the operation is accepted or not is determined by Raft consensus.
Decomposition Storage Model (Columnar)
Because Kudu is designed primarily for OLAP queries a Decomposition Storage Model is used. Kudu's columnar data storage model allows it to avoid unnecessarily reading entire rows for analytical queries. The strong-type (immutable type) requirement for the columns also allow for compression of a single data type.
Dictionary Encoding Run-Length Encoding Bit Packing / Mostly Encoding Prefix Compression
Each column in a Kudu table can be encoded in certain ways based on the type of that column. By default, bit packing is used for various int, double and float column types, run-length encoding is used for bool column types and dictionary-encoding for string or binary column types. By default Kudu doesn't compress columns but it supports per-column compression using LZ4, Snappy or zlib compression codecs.
Read Committed Snapshot Isolation Repeatable Read
Kudu allows for the user to set two different read operation modes through the Kudu API clients. By default, Kudu uses the 'Read Committed' isolation level. For the 'Snapshot Isolation' level the timestamp can either be set explicitly by the user or assigned by a server. The 'Snapshot Isolation' read level allows for consistent and repeatable reads.
https://github.com/apache/kudu
Cloudera
2016