IoTDB

IoTDB is a specialized database management system for time series data generated by a network of IoT devices with low computational power. It targets a workload that has high-frequency data write, large-volume data storage, and complex analytical queries. IoTDB supports queries that are common in monitoring and collecting metrics in IoT devices, namely filtering by predicates, query by time range, group aggregation, and data sample. Data in IoTDB is stored in TsFile, a file format designed for accessing, compressing, and storing time series data. Its storage is organized in LSM based structure catering to write throughput. Users can use JDBC driver or install a local Maven Repository to access and manage IoTDB using Java programs. IoTDB also supports accessing through the command-line interface and Python wrapper client API. IoTDB provides official documentation for integrating with data analysis systems such as Spark, Hadoop, Hive, and Grafana.

History

IoTDB is a project started in 2017 by Prof. Jianmin Wang’s group in the School of Software of Tsinghua University and China’s National Engineering Laboratory for Big Data Software. The project entered incubation by Apache Incubator on [Nov. 18th, 2018](https://incubator.apache.org/projects/iotdb.htm). The project evolves from a prior project of the same group called TsFile. TsFile is a columnar storage format optimized for storing time series data. IoTDB uses TsFile as its underlying storage format.

Query Interface

Custom API

SQL-like customized query language.

Concurrency Control

Not Supported

As IoTDB does not support transaction, it has a bare-bone concurrency control implementation with read locks and write locks. Their implementation does not follow a Two-Phase Locking protocol, as there are cases where a lock is acquired after another lock is released previously in the same function, and example is included in citation 1 of this section. IoTDB uses Java's native ReentrantReadWriteLock in the implementation. To avoid access conflict when concurrently reading or writing to user or role, IoTDB has HashLock implemented for user manager and lock manager. A HashLock lock is a wrapper around a fixed number of ReentrantReadWriteLock locks. By default, it initializes with an array of 100 ReentrantReadWriteLock locks. Each applicable database object corresponds to one lock in the array, according to hash value of the object. This avoids conflicts resulted from concurrent access of same database object, user or role in this case, while in the same time limit the amount of resource needed to managing those locks.

Logging

Logical Logging

Physical query plans are serialized and stored as logs. Write-Ahead-Logging with only REDO records.

Isolation Levels

Serializable

Query Execution

Tuple-at-a-Time Model

Stored Procedures

Supported

SQL-like PREPARE statement is supported.

Storage Architecture

Disk-oriented

Foreign Keys

Not Supported

Data Model

Column Family

System Architecture

Embedded

The overall IoTDB follows a client-server architecture. IoTDB client resides in the sensors(IoT devices) of the system, handling data collection and sending data to IoTDB server. Client can sync its data collected every user-configured interval with the server using Sync Tool; this allows data collected by the sensor to constantly being persisted in server, where the data can then be used for native query or shipped to other open-source platform for data analysis. Currently support single node server deployment. The group is working in progress to support shared-nothing cluster. IoTDB currently supports writing to HDFS.

Query Compilation

Code Generation

Apache Thrift

Compression

Delta Encoding Run-Length Encoding Naïve (Record-Level)

- Encoding ...IoTDB uses different encoding methods for different data types... ..* RLE (Run-Length Encoding): INT32, INT64, FLOAT, DOUBLE, BOOLEAN ....Suitable for the sequence of integer values and low-precision floating-point values that appear monotonic... ..* TS_2DIFF (Second-order Differential Encoding): INT32, INT64 ....Default encoding for time series data... ..* Regular Data Encoding: INT32, INT64 ....Suitable for fixed interval increasing sequence like time series... ..* GORILLA Encoding: FLOAT, DOUBLE ....Suitable for floating-point values with small variance... - Compression ..After encoding, data is cast to a binary stream; the binary stream is then compressed with SNAPPY...

Storage Organization

Log-structured

Storage based on LSM (Log Structured Merge Tree) to provide better write throughput.

IoTDB Logo
Website

https://iotdb.apache.org

Source Code

https://github.com/apache/incubator-iotdb

Tech Docs

https://iotdb.incubator.apache.org/#/Documents

Developer

Tsinghua University

Country of Origin

CN

Start Year

2017

Project Type

Academic, Open Source

Written in

Java

Supported languages

Java, Python

Compatible With

Hive, Spark SQL

Operating Systems

Linux, OS X, Windows

Licenses

Apache v2