HStreamDB is an open source distributed streaming database designed for accessing, storing, and processing real-time streaming data from sources such as IoT devices. All records added to the database are appended to an immutable object called a stream and there can be multiple streams in a database at once. HStreamDB seeks to provide low-latency access to analyses on the most current data in streams, which it achieves by incrementally updating in-memory materialized views in real-time as streaming data is ingested. HStreamDB also provides the ability to consume data from a stream from multiple client consumers through stream subscriptions, which deliver data to the client once it is ingested to the DBMS. HStreamDB allows for SQL queries with extensions for supporting streams, and it was built from scratch with Haskell.
HStreamDB is built by EMQ, a company providing open source IoT data infrastructure. It was first open sourced in 2021 and is under active development by the Haskell Team from EMQ.
HStreamDB was developed to incorporate a data-driven model to efficiently process stream data in a database. In contrast to the command-driven model of most databases which analyzes data when the client sends a request, HStreamDB’s goal was to analyze data as it is ingested in real-time and deliver the analyses with low-latency.
Compression is used to reduce network bandwidth utilization when transferring data to and from the database. Compression and decompression is performed entirely by the client and the compressed data is stored natively in the database. HStreamDB supports both gzip and zstd compression algorithms.
HStreamDB models data as records which are written to streams. All records have a unique identifier, and the data in a record can either be an HRecord or a Raw Record. A HRecord can be thought of as a traditional tuple in a database with support for nested maps and arrays. HRecords can be queried using SQL. A Raw Record contains arbitrary binary data which the database does not interpret or query. Raw Records are intended to be consumed from subscriptions.
HStreamDB supports interfacing with the database with either SQL or its custom API. It uses a SQL dialect that is a subset SQL-92 with extensions to support stream operations. Queries can be executed from a command line interface and the Java, Go, and Python clients. HStreamDB’s custom API is implemented in its clients and can be used to insert and consume data.
Since HStreamDB is a streaming database, it handles queries differently from a typical database. Queries are treated as running tasks that fetch data from streams and produce results continuously as the streams are updated. HStreamDB also supports subscriptions, where multiple consumers can read data in real-time from a single stream as records are added by producers.
https://github.com/hstreamdb/hstream
EMQ Technologies Co., Ltd.
2020