InfluxDB

View Current Viewing Revision #10 from 12/14/2018 3:08 p.m.

InfluxDB is an open source time series database built by InfluxData. Optimized for the storage and retrieval of time series data, it is used for monitoring and recording performance metrics and analytics. InfluxDB is written Go and has no external dependencies.

History

InfluxDB was created by Errplane in late 2013. Backed by Y Combinator, Errplane was initially a SaaS company centered around detecting anomalies in data. After discovering a huge gap in the market, the company pivoted to focusing on their open source time series database–this eventually became InfluxDB, and Errplane rebranded to become InfluxData Inc in late 2015.

Data Model

Column Family / Wide-Column

The InfluxDB data model consists of points, with each point having four components: a measurement, a tagset, a fieldset, and a timestamp. The tagset is a dictionary of key-value pairs, with values represented as strings and that are indexed. The fieldset consists of the data being recorded by the point; values can be floats, ints, strings, and booleans and cannot be not indexed. The measurement associates points with varying tagsets or fieldsets. An individual point is stored in a single database with exactly one retention policy, which contains information for storage duration, number of copies of the points, and the time range covered by shard groups. The retention policy, measurement, and tagset is collectively called a series.

Indexes

Inverted Index (Full Text) Log-Structured Merge Tree

InfluxDB consists of two databases in one; it uses a Time Series Index (TSI) for its series data and an inverted index for measurement, tag, and field metadata. The InfluxDB TSI is a log-structured merge tree-based database, consisting of the Index, Partition, LogFile, and IndexFile. For a single shard, the Index contains the entire index dataset, the partition contains a sharded partition of the data, the LogFile newly written series persisted as a WAL, and the IndexFile an immutable, memory-mapped index either built from the LogFile or merged from two indexFiles. Following a write, the series is looked up and a series ID returned. It is then sent to the Index, where the series ID is added to a roaring bitmap of series IDs or ignored if it has already been created. It is then hashed and sent to a Partition, which writes the series to the LogFile. Finally, the LogFile writes the series to a WAL file on disk and adds the series to an in-memory index.

Storage Architecture

Disk-oriented

The Time Series Index stores index on data on disk.

Logging

Physiological Logging

Each Write Ahead Log consists of blocks of writes and deletes with points that are serialized and then compressed using Snappy. A WAL entry has one byte representing the entry type of either write or delete and a uint32 denoting the length of the compressed block, followed by the compressed block itself. InfluxDB utilizes group commits, where optimally 5,000-10,000 points are batched together before a WAL file is fsync'd.

Storage Model

Decomposition Storage Model (Columnar)

InfluxDB stores data in a columnar format, further organized into time-bounded chunks. This results in easier deletion from the filesystem when data expires, since a large update of persisted data does not need to be done. The columnar data storage also supports common time series queries, such as scans across a time range followed by computations of functions like mean, max, or moving windows.

Stored Procedures

Not Supported

Joins

Not Supported

InfluxDB does not support traditional relational joins.

Query Interface

Custom API SQL HTTP / REST

The HTTP API is the primary way to query data; a variation of SQL called influxql can be used to query as well. A new custom query language called Flux is currently being developed.

Storage Organization

Log-structured

InfluxDB uses the Time-Structured Merge Tree (TSM) organizational structure; each TSM file contains compressed and sorted series data. A FileStore is also used to mediate access to all TSM files on disk, which ensures atomic installation of all TSM files upon file replacement and removes TSM files that are no longer used.

Query Execution

Tuple-at-a-Time Model

For a query, the query engine instantiates an iterator for each series per shard. These iterators are nested and form a tree which is executed bottom-up. Data is read, filtered, and merged to compute the result set.

Foreign Keys

Not Supported

InfluxDB does not support foreign keys.

Views

Materialized Views

InfluxDB supports continuous queries, which are conceptually similar to materialized views in that expensive query results are precomputed and stored. A continuous query can be used to automatically downsample high precision data that is commonly queried to a lower precision. As a result, queries on the lower precision data will require fewer resources and run faster.

Query Compilation

Not Supported

Compression

Dictionary Encoding Run-Length Encoding Bit Packing / Mostly Encoding

Compression strategy varies based on the shape of the data–for instance, with timestamps, run length encoding, Simple8B, or raw value encoding is used. Snappy, a dictionary compression scheme, is used for strings.

Revision #10 | Updated 12/14/2018 3:08 p.m.

View Current Viewing Revision #10 from 12/14/2018 3:08 p.m.

Developer

InfluxData

Country of Origin

Start Year

2013

Project Type

Commercial, Open Source

Written in

Supported languages

Elixir, Erlang, Go, Haskell, Java, JavaScript, Lisp, Matlab, Perl, PHP, Python, R, Ruby, Rust, Scala

Licenses

MIT

Wikipedia

https://en.wikipedia.org/wiki/InfluxDB

Revision #10 | Updated 12/14/2018 3:08 p.m.

InfluxDB

History

Data Model

Indexes

Storage Architecture

Logging

Storage Model

Stored Procedures

Joins

Query Interface

Storage Organization

Query Execution

Foreign Keys

Views

Query Compilation

Compression

Website

Source Code

Tech Docs

Developer

Country of Origin

Start Year

Project Type

Written in

Supported languages

Licenses

Wikipedia