TileDB

TileDB is an efficient multi-dimensional array management system which introduces a novel on-disk format that can effectively store dense and sparse array data with support for fast updates. It offers numerous features, including excellent compression, high IO performance on multiple data persistence backends (e.g., HDFS and S3), and easy integration with ecosystems used by today’s data scientists (e.g., Python NumPy).

History

TileDB was originally created at the Intel Science and Technology Center for Big Data, a collaboration between Intel Labs and MIT. The research project was published in a VLDB 2016 paper. TileDB, Inc. was founded in February 2017 to continue the further development and maintenance of the TileDB software.

Compression

Dictionary Encoding Run-Length Encoding

TileDB offers a variety of compressors to choose from: GZIP, Zstandard, LZ4, RLE, Bzip2, Double-delta TileDB also implements its own version of double-delta compression. It is similar to the one presented in Facebook’s Gorilla system. The difference is that TileDB uses a fixed bitsize for all values (in contrast to Gorilla’s variable bitsize). This makes the implementation a bit simpler, but also allows computing directly on the compressed data (which we are exploring in the future).

TileDB Logo
Website

http://tiledb.org/

Source Code

https://github.com/TileDB-Inc/TileDB

Tech Docs

https://people.csail.mit.edu/stavrosp/papers/vldb2017/VLDB17_TileDB.pdf

Developer

TileDB Inc, Intel Labs

Country of Origin

US

Start Year

2017

Acquired By

TileDB Inc

Project Type

Commercial, Open Source

Written in

C++

Supported languages

C, C++, Go, Java, Python, R

Operating Systems

Linux, OS X, Windows

Licenses

MIT