Kudu

View Current Viewing Revision #7 from 12/12/2018 11:17 p.m.

Apache Kudu is an open source storage engine for structured data that is part of the Apache Hadoop ecosystem. Apache Kudu is designed and optimized for big data analytics on rapidly changing data. It is designed for fast performance on OLAP queries. Built for distributed workloads, Apache Kudu allows for various types of partitioning of data across multiple servers. Also, being a part of the Hadoop ecosystem, Kudu can be integrated with data processing frameworks like Spark, Impala and MapReduce.

History

Prior to Kudu, most data storage engines were able to store one type of structured data, either static or mutable. Storage engines for static data were unable to make changes to individual records while storage engines for mutable data had a low throughput for sequential reads. Hence, developers typically used one storage engine for mutating data and another one for performing analytics. Apache Kudu was designed to support operations on both static and mutable data types, providing high throughput on both sequential-access and random-access queries. Kudu was developed as an internal project at Cloudera and became an open source project in 2016.

Query Interface

Custom API

Kudu has No-SQL client APIs for C++, Java and Python. Kudu can also be used with SQL-based query processing interfaces like Hadoop's Impact, MapReduce and Spark.

System Architecture

Shared-Nothing

Kudu is designed for distributed workloads so it follows a shared-nothing architecture. The data is horizontally partitioned into tablets (so an entire row is in the same tablet). Each tablet is replicated (typically into 3 or 5 replicas) and each of these replicas is stored in its own tablet server. Each tablet server has multiple unique tablets. For each tablet, one of the servers at which the tablet is stored is the leader and the rest of the servers are the followers. Leaders are elected using Raft consensus. Reads for a specific tablet can be done from any one of the tablet servers that store that tablet. Writes are only sent to the leader tablet server and whether the operation is accepted or not is determined by Raft consensus.

Query Execution

Vectorized Model Materialized Model

Kudu executes both reads and writes using a vectorized model. If there are predicates in the query, lazy materialization is used.

Query Compilation

JIT Compilation

Kudu uses JIT compilation (using LLVM) for record projection operations specifically. The reason for this optimization is that new records are initially stored in MemRowSets, which are in-memory row-stored units of a tablet. To compensate for the lower performance of the row-stored nature of these units, JIT compilation is used for projection operations that may touch these new records.

Storage Model

Decomposition Storage Model (Columnar)

Because Kudu is designed primarily for OLAP queries, a Decomposition Storage Model is used. Kudu's columnar data storage model allows it to avoid unnecessarily reading entire rows for analytical queries. The strong-type requirement for the columns also allows for single-data-type compression.

Foreign Keys

Not Supported

Kudu does not support foreign keys.

Indexes

Not Supported

Kudu does not support indexes. As an alternative, Kudu provides the capability to hash or range partition the data for quicker access.

Logging

Logical Logging

Kudu employs a logical log of operations. The operation logs are replicated for each tablet (partitions of the database). An operation log for each tablet is stored separately from the physical data. Operation logs are not able to be replayed or shipped. For single row operations (on a single tablet) Kudu is ACID. However for multi-row operations (on a single tablet), the Atomic property is not fully preserved since a single failed write will not rollback the entire operation, so per-row errors are possible. Multi-tablet operations are currently not possible with Kudu.

Data Model

Relational

Kudu is a relational database. Unlike traditional relation databases, Kudu utilizes partitioning to split the data into tablets that are stored on individual servers. All rows within a tablet are ordered by a primary key.

Concurrency Control

Multi-version Concurrency Control (MVCC)

Kudu employs MVCC. Kudu uses an optimistic concurrency model in which readers don't block writers and writes don't block readers. As a result, less lock acquisitions are needed during large table scans.

Isolation Levels

Read Committed Snapshot Isolation Repeatable Read

Kudu allows the user to set two different read operation modes through the Kudu API. By default, Kudu uses the 'Read Committed' isolation level. For the 'Snapshot Isolation' level the timestamp can either be set explicitly by the user or assigned by the server. The 'Snapshot Isolation' read level allows for consistent and repeatable reads.

Joins

Not Supported

Joins on Kudu are not supported. However, when used with an Impala query engine, Kudu tables can be joined with other tables stored in the Hadoop storage system.

Storage Architecture

Disk-oriented

Kudu is primarily disk-oriented. Although, there is an experimental version of Kudu that relies on persistent memory in a blocked cache.

Compression

Dictionary Encoding Run-Length Encoding Bit Packing / Mostly Encoding Prefix Compression

Each column in a Kudu table can be encoded in different ways based on the column type. By default, bit packing is used for int, double and float column types, run-length encoding is used for bool column types and dictionary-encoding for string and binary column types. By default Kudu doesn't compress columns but it supports per-column compression using LZ4, Snappy or zlib compression codecs.

Revision #7 | Updated 12/12/2018 11:17 p.m.