IoTDB

View Current Viewing Revision #7 from 12/10/2019 12:43 a.m.

IoTDB is a specialized database management system for time series data generated by a network of IoT devices with low computational power. It targets a workload that has high-frequency data write, large-volume data storage, and complex analytical queries. IoTDB supports queries that are common in monitoring and collecting metrics in IoT devices, namely filtering by predicates, query by time range, group aggregation, and data sample. Data in IoTDB is stored in TsFile, a file format designed for accessing, compressing, and storing time series data. Its storage is organized in LSM based structure catering to write throughput.

Users can use JDBC driver or install a local Maven Repository to access and manage IoTDB using Java programs. IoTDB also supports accessing through the command-line interface and Python wrapper client API. IoTDB provides official documentation for integrating with data analysis systems such as Spark, Hadoop, Hive, and Grafana.

History

IoTDB is a project started in 2017 by Prof. Jianmin Wang’s group in the School of Software of Tsinghua University and China’s National Engineering Laboratory for Big Data Software. The project entered incubation by Apache Incubator on Nov. 18th, 2018. The project evolves from a prior project of the same group called TsFile. TsFile is a columnar storage format optimized for storing time series data. IoTDB uses TsFile as its underlying storage format.

Logging

Logical Logging

Physical query plans are serialized and stored as logs. Write-Ahead-Logging with only REDO records.

Data Model

Column Family / Wide-Column

Concurrency Control

Not Supported

As IoTDB does not support transaction, it has a bare-bone concurrency control implementation with read locks and write locks. Their implementation does not follow a Two-Phase Locking protocol, as there are cases where a lock is acquired after another lock is released previously in the same function, and example is included in citation 1 of this section. IoTDB uses Java's native ReentrantReadWriteLock in the implementation.

To avoid access conflict when concurrently reading or writing to user or role, IoTDB has HashLock implemented for user manager and lock manager. A HashLock lock is a wrapper around a fixed number of ReentrantReadWriteLock locks. By default, it initializes with an array of 100 ReentrantReadWriteLock locks. Each applicable database object corresponds to one lock in the array, according to hash value of the object. This avoids conflicts resulted from concurrent access of same database object, user or role in this case, while in the same time limit the amount of resource needed to managing those locks.

Query Compilation

Code Generation

Apache Thrift

System Architecture

Embedded

The overall IoTDB follows a client-server architecture. IoTDB client resides in the sensors(IoT devices) of the system, handling data collection and sending data to IoTDB server. Client can sync its data collected every user-configured interval with the server using Sync Tool; this allows data collected by the sensor to constantly being persisted in server, where the data can then be used for native query or shipped to other open-source platform for data analysis. Currently support single node server deployment. The group is working in progress to support shared-nothing cluster. IoTDB currently supports writing to HDFS.