TileDB is a storage engine designed to support the storage and access of both dense and sparse multi-dimensional arrays. The key idea of TileDB is that it stores array elements into collections called fragments, which can be either dense or sparse. Each of these fragments stores data in data tiles. In the case of dense fragments, the capacity of data tiles is limited by a fixed chunk size. In the case of sparse fragments, the capacity of data tiles is limited by a fixed element size. TileDB also supports parallel I/O and is completely multi-threaded.
TileDB is designed to store many different types of data, such as genomic data, machine learning model parameters, imaging data, and LiDaR data.
TileDB was invented at the Intel Science and Technology Center for Big Data. The research center was a collaboration between Intel Labs and MIT. The research project was published in a VLDB 2017 paper. TileDB, Inc. was founded in February 2017 to further develop and maintain the DBMS.
By default, TileDB uses a disk-oriented oriented storage architecture (POSIX filesystem or HDFS). TileDB also supports data storage on object stores such as AWS S3, Azure Blob Storage, Google Cloud Storage and Minio. TileDB can be configured to store data in-memory via a RAM backend.
Dictionary Encoding Delta Encoding Run-Length Encoding
TileDB supports the following compressors: bzip2, dictionary, double-delta, gzip, LZ4, RLE, and Zstandard. It also supports a few data filters that usually function as compressors, such as the bit width reduction filter, float scaling filter, positive delta filter, and WebP filter. We detail the custom compressors in the section below:
• The double delta compressor is a compressor that is similar to Facebook's Gorilla system. However, TileDB's compressor uses a fixed bit-size instead of a variable bit-size.
• The dictionary encoding filter is a lossless compressor that takes a dictionary of all the unique strings in the input data and stores the indexes of the dictionary instead of the strings themselves in memory.
• The bit width reduction filter takes in input data with an unsigned integer type and compresses them to a smaller bit width if possible.
• The float scaling filter is a lossy compressor takes in input data with a floating point type. Along with arguments for a scale factor, an offset factor, and a byte width, the filter computes round((input_data[i] - offset) / scale), casts it to an integer type with the specified byte width, and stores that in main memory.
• The positive delta filter is a delta encoding filter that ensures that it only stores positive deltas. On negative deltas, this filter's execution will return with an error.
• The WebP filter takes raw colorspace values and converts them to WebP format. This filter supports lossy compression of imaging data.
TileDB does not provide transactional support, as it is a storage engine. It only guarantees atomic reads and writes. TileDB allows users to build a transactional manager on top for concurrency control. TileDB also supports data versioning, which is not MVCC, but can provide some of the functionality of MVCC.
TileDB's data model supports the storage of both dense and sparse arrays.
The data model of TileDB arrays allows it to support any number of dimensions. For dense arrays, the dimension types must be uniform, and they all must be either integer types, datetime types, or time types, which are all internally stored as integer types. TileDB only supports integer type dimensions for dense arrays to allow coordinates to be implicitly defined. For sparse arrays, the dimension types in a domain can be heterogeneous (e.g. they can be float or string), and coordinates are explicitly stored in memory. A set of dimensions for an array is called a domain.
An array element is defined by a unique set of dimension values or coordinates, and it is called a cell. In dense arrays, all cells must store exactly one value. In sparse arrays, cells can be empty, store one value, or store multiple values. Each logical cell contains the data from the defined attributes in the array schema. Attributes can have heterogeneous types for both sparse and dense arrays.
TileDB uses an R-tree as an index to implement sparse array slicing. On write, TileDB builds an R-tree index on the non-empty cells of the sparse array. To do this, it groups the coordinates of the non-empty cells into minimum bounding rectangles, then recursively groups these rectangles into a tree structure. On read, TileDB determines which minimum bounding rectangles overlap the query coordinates. Then, it uses parallel processing to collect these rectangles, decompress them, individually check the coordinates of the data collected, and retrieve the attribute data that matches the query.
TileDB uses intra-operator parallel execution for both its read and write queries. The main operations in which TileDB uses heavyweight parallelization on are reading/writing I/O and tile filtering/unfiltering. When executing I/O tasks, the reading/writing is parallelized per attribute, and each attribute is parallelized per data tile. When executing tile filtering tasks, the filtering/unfiltering is parallelized per attribute, each attribute is parallelized per data tile, and each data tile is parallelized per filter chunk. A chunk is a size parameter that defaults to 64KB.
This parallelism is implemented via static thread pools. TileDB uses both a compute task thread pool and an I/O task thread pool to help parallelize execution. It includes two thread pools to ensure that I/O tasks do not overload CPU-bound tasks during execution. Internally, a TileDB thread pool is an array of std::thread
, and tasks to be executed with this thread pool are kept in a queue.
TileDB has APIs in the following languages: C, C++, C#, Python, Java, R, and Go. One can use three methods to run SQL on top of TileDB. First, one can use TileDB-SQL-Py, a Python package that allows users to run SQL queries in the Python environment. In addition, the MariaDB client REPL, TileDB-Presto connector, and TileDB-Trino connector can be invoked to run SQL queries directly.
TileDB supports interoperation functionality with Apache Arrow.
TileDB's main storage format is a multi-file format that stores the array schema, fragments, consolidated fragment metadata, commits, and the array metadata. The array schema directory contains multiple files, each of which is labelled with a timestamp. TileDB supports array schema modification and thus this is needed in order to access data at different times using the appropriate schema. The fragments stored are timestamped writes to TileDB arrays. Each fragment has its own directory. In this directory, the attribute and dimension data are stored, as well as the fragment metadata, which is a file that contains important data about the fragment, such as the name of its array schema and index information. The consolidated fragment metadata is stored mainly as a read query optimization, and this small file contains the footers of all the fragment metadata files. The commit files mainly serve as indicator files that fragment creation was successful. Lastly, the array metadata files store user-defined key value pairs that can be extracted by querying the TileDB array.
Decomposition Storage Model (Columnar)
TileDB uses a decomposition storage model (DSM) to store attribute data. This attribute data is stored in global order on disk. Global order is determined by tile order, and then cell order. Recall that TileDB arrays have multiple (say, n
) dimensions. Each of these dimensions comes with a tile extent. The tile extents of the dimensions determine the size of the tile, and effectively groups the data into smaller blocks. Tile order orders these blocks of data (which can be either row-major or column-major order) and cell order orders the cells within a tile.