ksqlDB

Viewing Revision #5 from 2019-05-02 04:11 View Current

Kafka SQL or KSQL is a streaming SQL engine that provides SQL interface to the streams in Apache Kafka. It is developed by Confluent Inc. and is built on the Kafka Streams API, which supports joins, aggregations, windowing and sessionization on streaming data. The data is stored in a Kafka cluster, which is a collection of Kafka brokers, segregated into topics. Each topic consists of a defined number of partitions, where each partition is an immutable sequence of messages. KSQL can be used as a library by applications to run SQL queries on top of the stream data stored in the Kafka cluster.[03][01]

Website: https://www.confluent.io/product/ksqldb/[01]
Source Code: https://github.com/confluentinc/ksql[02] Accessed: Jul 29, 2026 Last Commit: Jul 29, 2026
Developer: Confluent, Inc.
Country of Origin: US
Start Year: 2017 [04]
Project Type: Open Source
Written in: Java
Supported Languages: SQL
Compatible With: Spark SQL

As opposed to streaming systems like Spark Streaming which require using Java/Scala for development, KSQL provides a completely interactive, SQL only interface improving the ease of access. KSQL processes one message at a time, making it a true stream processing system instead of a micro-batching system.

Website: https://www.confluent.io/product/ksqldb/[01]
Source Code: https://github.com/confluentinc/ksql[02] Accessed: Jul 29, 2026 Last Commit: Jul 29, 2026
Developer: Confluent, Inc.
Country of Origin: US
Start Year: 2017 [04]
Project Type: Open Source
Written in: Java
Supported Languages: SQL
Compatible With: Spark SQL

Derivative Systems

Heroic

ksqlDB

Viewing Revision #5 from 2019-05-02 04:11 View Current

Data Model[03]

Relational Key-Value

The basic unit of storage in Kafka is a message, which consists of a key, value, timestamp, partition number and its offset in the partition. Key and value are just arrays of bytes, hence there is no restriction on the type of values they can hold. A schema can be associated with each topic, which is imposed upon the value part of the message. ToDo: Verify who validates the schema i.e. can schema be applied on top of an existing Kafka topic.

KSQL provides two different concepts of organizing a topic's data, streams, and table. Messages within a stream are independent of each other and unbounded. A table, on the other hand, is a stateful entity where a new message is considered either as a new entry in the table or an update to the entry in the existing table with the same key. Hence, in the case of a KSQL table, the messages in the topic can be considered as a changelog/redo-log.

Query Interface[03]

Custom API SQL Command-line / Shell

KSQL is a SQL-like query language with certain extensions for streams. As defined in the data model, tables and streams are the two major abstractions in KSQL.

KSQL provides the following execution modes/interfaces:

Interactive mode: Using command line interface or REST API
Application mode: A list of queries can be provided as an input to the KSQL jar
Embedded mode: KSQL queries can be embedded within the statements of Kafka Streams API, similar to Spark SQL.

Derivative Systems

Heroic

Citations

4 sources

Database Streaming with ksqlDB | Confluent confluent.io Accessed: 2026-07-17
GitHub - confluentinc/ksql: The database purpose-built for stream processing applications. · GitHub github.com Accessed: 2026-06-04
https://openproceedings.org/2019/conf/edbt/EDBT19_paper_329.pdf openproceedings.org Dead — Check Archive Modified: 2019-05-13 Accessed: 2026-06-07
Introducing KSQL: Streaming SQL for Apache Kafka | Confluent confluent.io Dead — Check Archive Accessed: 2026-06-07

Revision #5 Last Updated: 2019-05-02 00:11