Pinot

Viewing Revision #9 from 2018-12-12 03:13 View Current

Pinot is a distributed relational OLAP datastore written by LinkedIn. It's designed to support large-scale real-time analytics on any given data set. For use cases that are sensitive to data freshness, Pinot is able to directly ingest streaming data from Kafka. For applications that can tolerate a lag time of few hours to a day of data, Pinot is able to ingest batch data from Hadoop. It's also able to dynamically merge data streams that come from both offline and online systems.[04][05]

Website: https://engineering.linkedin.com/teams/data/projects/pinot[01]
Source Code: https://github.com/apache/pinot[02] Accessed: Jul 12, 2026 Last Commit: Jul 11, 2026
Tech Docs: https://github.com/apache/incubator-pinot/wiki[03]
Developer: LinkedIn Corporation
Country of Origin: US
Start Year: 2014 [07]
Project Type: Open Source
Written in: Java
Supported Languages: Java
Operating System: All OS with Java VM
License: Apache v2

Pinot uses a hybrid data model. It divides tables to segments, which are sets of tuples. Tuples inside each segment are organized in columnar manner. A segment is a basic unit in Pinot: Data from Kafka or Hadoop will be processed and cached locally as segments in Pinot server nodes; It stores metadata and necessary zone maps for the tuples inside it; Storage optimizations are applied for tuples in a segment; Indexes are built for each segment; Query plans and optimizations are also generated and performed on a per-segment basis.

The external building blocks of Pinot are Zookeeper and Apache Helix.

Website: https://engineering.linkedin.com/teams/data/projects/pinot[01]
Source Code: https://github.com/apache/pinot[02] Accessed: Jul 12, 2026 Last Commit: Jul 11, 2026
Tech Docs: https://github.com/apache/incubator-pinot/wiki[03]
Developer: LinkedIn Corporation
Country of Origin: US
Start Year: 2014 [07]
Project Type: Open Source
Written in: Java
Supported Languages: Java
Operating System: All OS with Java VM
License: Apache v2

Compatible Systems

PrestoDB

Pinot

Viewing Revision #9 from 2018-12-12 03:13 View Current

The external building blocks of Pinot are Zookeeper and Apache Helix.[04][05]

History[06][07][08][01]

Pinot was first developed by LinkedIn in 2014 as an internal analytics infrastructure. It originated from the demands to scale out OLAP systems to support low-latency real-time queries on huge volume data. It was later open-sourced in 2015 and entered Apache Incubator in 2018. Pinot was named after the Pinot noir, name of a grape varietal that can produce the most complex wine but is the toughest to grow and process. It's a portrayal of data: powerful but hard to analyze.

Checkpoints[04]

Not Supported

Pinot uses replicas to provide fault tolerance and high availability. It also uses redundant controller instances to improve availability.

However, checkpoints are not supported since segments are immutable, which means there will be no write on segments during the execution of queries. But it's possible for a segment to be entirely replaced with a newer version.

Compression[04][01]

Dictionary Encoding Run-Length Encoding Bitmap Encoding Bit Packing / Mostly Encoding

Pinot leverages various types of encoding to reduce storage overhead. The typical size of a segment varies from a few hundred megabytes up to a few gigabytes. Different data encoding techniques have different specialized physical operators to optimize query execution.

Concurrency Control[04]

Not Supported

Pinot moves the execution of queries to segments. There will be no race condition in the server-side query execution since segments are immutable.

Data Model[04]

Relational

Pinot is a relational datastore. The data type of each attribute can be integers with various length, floating-point numbers, strings, booleans, arrays, and timestamps. The column type of each attribute can be dimensions, metrics, and time.

Indexes[04][03]

BitMap Inverted Index (Full Text)

Pinot supports pluggable indexing technologies like Sorted Index, BitMap Index, and Inverted Index. BitMap Index is used to optimize queries on categorical data. Inverted Index is used to support lookup by key word. They are chosen to leverage the features of social data: usually categorical and textual.

Inverted Index can be built based on BitMap. And BitMap Index can be optimized with various compression techniques. It can also be physically reordered to optimize some specific queries in Pinot, since filters on such column usually target a contiguous range of the column data.

Joins[04]

Not Supported

The query interface of Pinot, Pinot Query Language (PQL), does not support joins.

Query Execution[04]

Tuple-at-a-Time Model Vectorized Model

Query Interface[04]

Custom API

Pinot uses PQL query interface, which is a subset of SQL. PQL supports selection, projection, aggregations, and top-n. But it does not support joins, nested queries, record-level creation, updates, deletion or any data definition language (DDL).

Storage Model[04]

Hybrid

Pinot uses a hybrid data model, which divides rows into segments and stores data inside each segment in Columnar manner. A segment is a basic unit of replication. It's immutable and typically contains tens of millions of rows.

Storage Organization[04]

Heaps

Pinot stores segments in directories of UNIX filesystem. Each such directory contains a metadata file and an index file. The metadata file stores information about record columns in the segment. The index file stores indexes for all the columns. The global metadata about segments, including the mapping of a segment to its position, is maintained in controller clusters.

System Architecture[04]

Shared-Nothing

Pinot consists of four parts: servers, controllers, brokers, and minions. They together support the functionality of data storage, data management, and query processing.

Servers

Servers are responsible for data storage. Pinot stores segments in each server node in a distributed manner. Each segment has multiple replicas and transactions are executed in active-active manner.

Controllers

Controllers are responsible for maintaining global metadata. They are implemented with Apache Helix and Zookeeper.

Brokers

Brokers are responsible for query routing. They control the flow of query such as where each query should go to and how to generate the final result with intermediate results from different nodes.

Minions

Minions are responsible for running maintenance tasks, which are usually time consuming and should not influence the running queries.

Compatible Systems

PrestoDB

Citations

8 sources

https://engineering.linkedin.com/teams/data/projects/pinot linkedin.com Dead — Check Archive Accessed: 2026-05-24
GitHub - apache/pinot: Apache Pinot - A realtime distributed OLAP datastore · GitHub github.com Accessed: 2026-06-04
Home · apache/pinot Wiki · GitHub github.com Accessed: 2026-05-24
http://delivery.acm.org/10.1145/3200000/3190661/p583-im.pdf acm.org Accessed: 2026-05-24
Home · apache/pinot Wiki · GitHub github.com Accessed: 2026-05-24
Pinot Project Incubation Status - Apache Incubator apache.org Modified: 2025-12-31 Accessed: 2026-06-07
Real-time Analytics at Massive Scale with Pinot | LinkedIn Engineering linkedin.com Accessed: 2026-06-07
Open Sourcing Pinot: Scaling the Wall of Real-Time Analytics | LinkedIn Engineering linkedin.com Accessed: 2026-06-07

Revision #9 Last Updated: 2018-12-11 22:13