InfluxDB is an open source time series database built by InfluxData. Optimized for the storage and retrieval of time series data, it is used for monitoring and recording performance metrics and analytics. InfluxDB is written Go and has no external dependencies.
InfluxDB was created by Errplane in late 2013. Backed by Y Combinator, Errplane was initially a SaaS company centered on anomaly detection in data. After discovering a huge gap in the market, the company pivoted to focusing on their open source time series database–this eventually became InfluxDB, and Errplane rebranded to become InfluxData Inc in late 2015.
The HTTP API is the primary way to query data; a variation of SQL called influxql can be used to query as well. A new custom query language called Flux is currently being developed.
InfluxDB supports continuous queries, which are conceptually similar to materialized views in that expensive query results are precomputed and stored. A continuous query can be used to automatically downsample high precision data that is commonly queried to a lower precision. As a result, queries on the lower precision data will require fewer resources and run faster.
InfluxDB does not support traditional relational joins.
InfluxDB stores data in a columnar format, further organized into time-bounded chunks. This results in easier deletion from the filesystem when data expires, since a large update of persisted data does not need to be done. The columnar data storage also supports common time series queries, such as scans across a time range followed by computations of functions like mean, max, or moving windows.
For a query, the query engine instantiates an iterator for each series per shard. These iterators are nested and form a tree which is executed bottom-up. Data is read, filtered, and merged to compute the result set.
InfluxDB uses a Time Series Index (TSI), which is a log-structured merge tree-based database for its series data. TSI consists of the Index, Partition, LogFile, and IndexFile. For a single shard, the Index contains the entire index dataset, the partition a sharded partition of the data, the LogFile newly written series persisted as a WAL, and the IndexFile an immutable, memory-mapped index either built from the LogFile or merged from two indexFiles. Following a write, the series is looked up and a series ID returned. It is then sent to the Index, where the series ID is added to a roaring bitmap of series IDs or ignored if it has already been created. It is then hashed and sent to a Partition, which writes the series to the LogFile. Finally, the LogFile writes the series to a WAL file on disk and adds the series to an in-memory index.
The Time Series Index stores index on data on disk.
The InfluxDB data model consists of points, with each point having four components: a measurement, a tagset, a fieldset, and a timestamp. The tagset is a dictionary of key-value pairs, with values represented as strings and that are indexed. The fieldset consists of the data being recorded by the point; values can be floats, ints, strings, and booleans and cannot be not indexed. The measurement associates points with varying tagsets or fieldsets. An individual point is stored in a single database with exactly one retention policy, which contains information for storage duration, number of copies of the points, and the time range covered by shard groups. The retention policy, measurement, and tagset is collectively called a series.
Compression strategy varies based on the shape of the data–for instance, with timestamps, run length encoding is used in the best case, [Simple8B](https://godoc.org/github.com/jwilder/encoding/simple8b) used in a good case, and raw value encoding is used for the worst case. [Snappy](https://en.wikipedia.org/wiki/Snappy_(compression)), a dictionary compression scheme, is used for strings.
InfluxDB uses the Time-Structured Merge Tree (TSM) organizational structure; each TSM file contains compressed and sorted series data. A FileStore is also used to mediate access to all TSM files on disk, which ensures atomic installation of all TSM files upon file replacement and removes TSM files that are no longer used.
Commercial, Open Source