HStreamDB is an open source distributed streaming database designed for accessing, storing, and processing real-time streaming data from sources such as IoT devices. All records added to the database are appended to an immutable object called a stream and there can be multiple streams in a database at once. HStreamDB seeks to provide low-latency access to analyses on the most current data in streams, which it achieves by incrementally updating in-memory materialized views in real-time as streaming data is ingested. HStreamDB also provides the ability to consume data from a stream from multiple client consumers through stream subscriptions, which deliver data to the client once it is ingested to the DBMS. HStreamDB allows for SQL queries with extensions for supporting streams, and it was built from scratch with Haskell.
HStreamDB is built by EMQ, a company providing open source IoT data infrastructure. It was first open sourced in 2021 and is under active development by the Haskell Team from EMQ.
HStreamDB was developed to incorporate a data-driven model to efficiently process stream data in a database. In contrast to the command-driven model of most databases which analyzes data when the client sends a request, HStreamDB’s goal was to analyze data as it is ingested in real-time and deliver the analyses with low-latency.
Compression is used to reduce network bandwidth utilization when transferring data to and from the database. Compression and decompression is performed entirely by the client and the compressed data is stored natively in the database. HStreamDB supports both gzip and zstd compression algorithms.
HStreamDB models data as records which are written to streams. All records have a unique identifier, and the data in a record can either be an HRecord or a Raw Record. A HRecord can be thought of as a traditional tuple in a database with support for nested maps and arrays. HRecords can be queried using SQL. A Raw Record contains arbitrary binary data which the database does not interpret or query. Raw Records are intended to be consumed from subscriptions.
HStreamDB supports nested loop joins between two streams and two materialized views. Joins between a stream and materialized view are also supported.
HStreamDB supports interfacing with the database with either SQL or its custom API. It uses a SQL dialect that is a subset SQL-92 with extensions to support stream operations. Queries can be executed from a command line interface and the Java, Go, and Python clients. HStreamDB’s custom API is implemented in its clients and can be used to insert and consume data.
Since HStreamDB is a streaming database, it handles queries differently from a typical database. Queries are treated as running tasks that fetch data from streams and produce results continuously as the streams are updated. HStreamDB also supports subscriptions, where multiple consumers can read data in real-time from a single stream as records are added by producers.
HStreamDB is a disk-oriented database that uses the RocksDB storage engine. This allows HStreamDB to support large scale data streams.
HStreamDB does not implement its own storage layer, and instead relies on RocksDB as a key-value store. All of its data is eventually processed and stored in the key-value file format implemented in RocksDB.
HStreamDB supports incrementally updated materialized views. As data is added to streams, views are updated in real-time. This makes querying views fast since they always contain the latest result. Views are different from streams since they are only stored in memory.
EMQ Technologies Co., Ltd.