Pinot

Pinot is a distributed relational OLAP datastore written by LinkedIn. It's designed to support large-scale near-realtime analytics applications under interactive scenarios. It uses a hybrid data model to tradeoff the benefits for different use cases. It also leverages asynchronous I/O for streaming sources. The external low-layer building blocks of Pinots includes Zookeeper and Apache Helix.

Data Model

Relational

Pinot uses relational data model. In terms of data types, attributes in a relation can be integers with various length, floating-point numbers, strings, booleans, arrays, and timestamps. In terms of analyst, attributes can be dimensions and metrics.

Storage Model

Hybrid

Pinot uses a hybrid data model, which divides rows into segments and stores data inside each segment in Columnar manner. A segment is a basic unit of replication. It's immutable and typically contains tens of millions of rows.

Storage Organization

Heaps

Pinot stores segments in directories of UNIX filesystem. Each such directory contains a metadata file and an index file. The metadata file stores information about record columns in the segment. The index file stores indexes for all the columns. The global metadata about segments, including the mapping of a segment to its position, is maintained in controller clusters.

System Architecture

Shared-Nothing

Pinot consists of four parts: servers, controllers, brokers, and minions. They together support the functionality of data storage, data management, and query processing.

Servers

Servers are responsible for data storage. Pinot stores segments in each server node in a distributed manner. Each segment has multiple replicas and transactions are executed in active-active manner.

Controllers

Controllers are responsible for maintaining global metadata. They are implemented with Apache Helix and Zookeeper.

Brokers

Brokers are responsible for query routing. They control the flow of query such as where each query should go to and how to generate the final result with intermediate results from different nodes.

Minions

Minions are responsible for running maintenance tasks, which are usually time consuming and should not influence the running queries.

Query Interface

Custom API

Pinot uses PQL query interface, which is a subset of SQL. PQL supports selection, projection, aggregations, and top-n. But it does not support joins, nested queries, record-level creation, updates, deletion or any data definition language (DDL).

Compression

Dictionary Encoding Bit Packing / Mostly Encoding

Pinot leverages dictionary encoding and bit packing for columns in segments to reduce storage overhead. The typical space a segment consumes varies from hundreds of megabytes to several gigabytes.

Website

https://engineering.linkedin.com/teams/data/projects/pinot

Source Code

https://github.com/apache/incubator-pinot

Tech Docs

https://github.com/apache/incubator-pinot/wiki

Developer

LinkedIn

Country of Origin

US

Start Year

2015

Project Type

Open Source

Written in

Java

Supported languages

SQL

Operating Systems

All OS with Java VM

Licenses

Apache v2