TileDB

Viewing Revision #40 from 2023-05-02 02:41 View Current

TileDB is an embedded storage engine designed to support the storage and access of both dense and sparse multi-dimensional arrays. The key idea of TileDB is that it stores array elements into collections called fragments, which can be either dense or sparse. Each of these fragments stores data in data tiles. In the case of dense fragments, the capacity of data tiles is limited by a fixed chunk size. In the case of sparse fragments, the capacity of data tiles is limited by a fixed element size. TileDB also supports parallel I/O and is completely multi-threaded. TileDB is designed to store many different types of data, such as genomic data, machine learning model parameters, imaging data, and LiDaR data.[04][05]

Logo Versions

Website: https://www.tiledb.com/[01]
Source Code: https://github.com/TileDB-Inc/TileDB[02] Accessed: Jun 27, 2026 Last Commit: Jun 15, 2026
Tech Docs: https://docs.tiledb.com/main[03]
Twitter: @tiledb
Developers: Intel Corporation
TileDB
Country of Origin: US
Start Year: 2014 [25][03]
Acquired By: TileDB [03]
Project Types: Commercial, Open Source
Written in: C++
Supported Languages: C, C#, C++, Go, Java, Python, R
Inspired By: SciDB
Compatible With: MariaDB, PrestoDB
Operating Systems: Linux, macOS, Windows
License: MIT License

TileDB Inc. also has a cloud offering called TileDB Cloud SaaS, which is a closed-source offering of TileDB with additional features, such as serverless UDFs and task graphs to build custom workflows of TileDB tasks. The architecture of TileDB Cloud is centered around a REST API service, and uses their embedded, open-source storage engine.

Logo Versions

Website: https://www.tiledb.com/[01]
Source Code: https://github.com/TileDB-Inc/TileDB[02] Accessed: Jun 27, 2026 Last Commit: Jun 15, 2026
Tech Docs: https://docs.tiledb.com/main[03]
Twitter: @tiledb
Developers: Intel Corporation
TileDB
Country of Origin: US
Start Year: 2014 [25][03]
Acquired By: TileDB [03]
Project Types: Commercial, Open Source
Written in: C++
Supported Languages: C, C#, C++, Go, Java, Python, R
Inspired By: SciDB
Compatible With: MariaDB, PrestoDB
Operating Systems: Linux, macOS, Windows
License: MIT License

Derivative Systems

GenomicsDB

TileDB

Viewing Revision #40 from 2023-05-02 02:41 View Current

NoSQL

History[06]

TileDB was invented at the Intel Science and Technology Center for Big Data in collaboration with Intel Labs and MIT. The research project was published in a VLDB 2017 paper. TileDB, Inc. was founded in February 2017 to further develop and maintain the DBMS.

Checkpoints[07]

Not Supported

TileDB does not support checkpoints.

Compression[05][08][09][10]

Dictionary Encoding Delta Encoding Run-Length Encoding

TileDB supports compressors, which operate on data tiles. The types of compressors it supports include bzip2, dictionary, double-delta, gzip, LZ4, RLE, and Zstandard. It also supports a few data filters that reduce data size, such as the bit width reduction filter, float scaling filter, positive delta filter, and WebP filter.

We detail the custom compressors in the section below:

• The double delta compressor uses the timestamp data compression scheme first mentioned in the VLDB paper on the Gorilla time series DBMS. However, TileDB's compressor uses a fixed bit-size instead of a variable bit-size.

• The dictionary encoding filter is a lossless compressor that computes a dictionary of all the unique strings in the input data and stores the indexes of the dictionary instead of the strings themselves in memory.

• The bit width reduction filter takes in input data with an unsigned integer type and compresses them to a smaller bit width if possible.

• The float scaling filter is a lossy compressor takes in input data with a floating point type. Along with arguments for a scale factor, an offset factor, and a byte width, the filter computes round((input_data[i] - offset) / scale), casts it to an integer type with the specified byte width, and stores that in main memory.

• The positive delta filter is a delta encoding filter that ensures that it only stores positive deltas.

• The WebP filter takes raw colorspace values and converts them to WebP image format. This filter supports lossy compression of imaging data.

Concurrency Control[11][12][07]

Not Supported

TileDB does not provide transactional support, as it is a storage engine. It only guarantees atomic reads and writes. TileDB also supports data versioning, which is not MVCC, but can provide some of the functionality of MVCC. Support for data versioning within TileDB is built into the file format. The TileDB file format stores an array write as a separate fragment, which includes timestamp information. With this information, it is possible to read an array that has writes only within a specified time interval.

Data Model[13][14][05]

Array / Matrix

TileDB's data model supports the storage of both dense and sparse arrays.

The data model of TileDB arrays allows it to support any number of dimensions. For dense arrays, the dimension types must be uniform, and they all must be either integer types, datetime types, or time types, which are all internally stored as integer types. TileDB only supports integer type dimensions for dense arrays to allow coordinates to be implicitly defined. For sparse arrays, the dimension types in a domain can be heterogeneous (e.g. float or string), and coordinates are explicitly stored in memory. A set of dimensions for an array is called a domain.

An array element ("cell") is defined by a unique set of dimension values or coordinates. In dense arrays, all cells must store exactly one value. In sparse arrays, cells can be empty, store one value, or store multiple values. Each logical cell contains the data from the defined attributes in the array schema. Attributes can have heterogeneous types for both sparse and dense arrays.

Foreign Keys[07]

Not Supported

TileDB does not support foreign keys.

Hardware Acceleration

TileDB does not rely on specialized hardware to speed up query execution.

Indexes[12]

R-Tree

TileDB uses an R-tree as an index to implement sparse array slicing. On array write, TileDB builds an R-tree index on the non-empty cells of the sparse array. To do this, it groups the coordinates of the non-empty cells into minimum bounding rectangles, then recursively groups these rectangles into a tree structure. On read, TileDB determines which minimum bounding rectangles overlap the query coordinates. Then, it uses parallel processing to collect these rectangles, decompress them, individually check the coordinates of the data collected, and retrieve the attribute data that matches the query.

Isolation Levels[07]

Not Supported

TileDB does not provide transaction support, so no transaction isolation is guaranteed. However, it guarantees both atomic reads and writes.

Joins[07]

Not Supported

TileDB does not support join operations.

Logging[07]

Not Supported

TileDB does not support logging.

Parallel Execution[15][16][17]

Intra-Operator (Horizontal)

TileDB uses intra-operator parallel execution for both its read and write queries. The main operations in which TileDB uses parallelization on are reading/writing I/O and tile filtering/unfiltering. When executing I/O tasks, the reading/writing is parallelized per attribute, and each attribute is parallelized per data tile. When executing tile filtering tasks, the filtering/unfiltering is parallelized per attribute, each attribute is parallelized per data tile, and each data tile is parallelized per filter chunk. A chunk is a size parameter that defaults to 64KB.

This parallelism is implemented via static thread pools. TileDB uses both a compute task thread pool and an I/O task thread pool to help parallelize execution. It includes two thread pools to ensure that I/O tasks do not overload CPU-bound tasks during execution.

Query Compilation

Not Supported

TileDB does not support query compilation.

Query Interface[18][19][03][15]

Custom API SQL

TileDB has direct APIs in the following languages: C, C++, C#, Python, Java, R, and Go. One can use three methods to run SQL on TileDB arrays. First, one can use TileDB-SQL-Py, a Python package that allows users to run SQL queries in the Python environment. In addition, the MariaDB client REPL, TileDB-Presto connector, and TileDB-Trino connector can be invoked to run SQL queries directly.

Storage Architecture[20]

Disk-oriented In-Memory

By default, TileDB uses a disk-oriented oriented storage architecture (POSIX filesystem or HDFS). TileDB also supports data storage on object stores such as AWS S3, Azure Blob Storage, Google Cloud Storage, and Minio. TileDB can be configured to store data in-memory via a RAM backend.

Storage Format[12][20][21][22]

Arrow Custom

TileDB supports interoperation functionality with Apache Arrow.

TileDB's main storage format is a multi-file format that stores the array schema, fragments, consolidated fragment metadata, commits, and the array metadata. The array schema directory contains multiple files, each of which is labelled with a timestamp. TileDB supports array schema modification and thus the timestamp label is needed to access data at different times using the appropriate schema. In TileDB, array schema modification is when attributes can either be added or dropped after the array has been written into. The fragments stored are timestamped writes to TileDB arrays. Each fragment has its own directory. In this directory, the attribute and dimension data are stored, as well as the fragment metadata, which is a file that contains important data about the fragment, such as the name of its array schema and index information. The consolidated fragment metadata contains the footers of all the fragment metadata files. This file is stored as a read query optimization. When reading an array that has many fragments, retrieving all the fragment metadata footers from each fragment can be time-consuming. The commit files mainly serve as indicator files that fragment creation was successful. Lastly, the array metadata files store user-defined key value pairs that can be accessed by querying a TileDB array.

Storage Model[23]

Decomposition Storage Model (Columnar)

TileDB uses a decomposition storage model (DSM) to store attribute data. This attribute data is stored in global order on disk. Global order is determined by tile order, and then cell order. TileDB arrays have multiple dimensions. Each of these dimensions comes with a tile extent. The tile extents of the dimensions determine the size of the tile, and effectively groups the data into smaller blocks. Tile order orders these blocks of data (which can be either row-major or column-major order) and cell order orders the cells within a tile.

Storage Organization[12]

Sorted Files

TileDB's storage manager stores data according to the coordinate of the cell value being inserted, which would make the sorted files model closest to the its storage manager implementation.

Stored Procedures[07]

Not Supported

System Architecture[24]

Embedded

TileDB is a embeddable storage library.

Views[07]

Not Supported

Derivative Systems

GenomicsDB

Citations

25 sources

TileDB • Designed for Discovery tiledb.com Accessed: 2026-06-05
https://github.com/TileDB-Inc/TileDB github.com Accessed: 2026-06-05
https://docs.tiledb.com/main tiledb.com Spam — Check Archive Accessed: 2026-06-05
https://docs.tiledb.io/en/stable/index.html tiledb.io Spam — Check Archive Accessed: 2026-06-05
Academy • TileDB tiledb.com Accessed: 2026-06-07
Academy • TileDB tiledb.io Accessed: 2026-06-05
Questions about tileDB features - TileDB Forum tiledb.com Accessed: 2026-06-07
TileDB/tiledb/sm/compressors at main · TileDB-Inc/TileDB · GitHub github.com Accessed: 2026-06-02
TileDB/tiledb/sm/filter at main · TileDB-Inc/TileDB · GitHub github.com Accessed: 2026-06-02
TileDB/format_spec/filters at main · TileDB-Inc/TileDB · GitHub github.com Accessed: 2026-06-02
tutorials_sparse_versioning • TileDB tiledb.com Accessed: 2026-06-02
Academy • TileDB tiledb.com Accessed: 2026-06-02
TileDB/tiledb/sm/array_schema/array_schema.cc at main · TileDB-Inc/TileDB · GitHub github.com Accessed: 2026-06-02
https://people.csail.mit.edu/stavrosp/papers/vldb2017/VLDB17_TileDB.pdf mit.edu Modified: 2016-11-25 Accessed: 2026-06-07
Academy • TileDB tiledb.com Accessed: 2026-06-02
Threading Model · TileDB-Inc/TileDB Wiki · GitHub github.com Accessed: 2026-06-02
TileDB/tiledb/common/thread_pool at main · TileDB-Inc/TileDB · GitHub github.com Accessed: 2026-06-02
GitHub - TileDB-Inc/TileDB-Presto: TileDB Connector for PrestoDB · GitHub github.com Accessed: 2026-06-02
GitHub - TileDB-Inc/TileDB-Trino: TileDB Connector for TrinoDB · GitHub github.com Accessed: 2026-06-02
Academy • TileDB tiledb.com Accessed: 2026-06-02
Powered by | Apache Arrow apache.org Modified: 2026-05-08 Accessed: 2026-06-02
Academy • TileDB tiledb.com Accessed: 2026-06-02
Academy • TileDB tiledb.com Accessed: 2026-06-02
https://docs.tiledb.io/en/stable/introduction.html?highlight=distributed tiledb.io Dead — Check Archive Accessed: 2026-06-07
https://tiledb.com/blog/tiledb-a-refresher-on-what-and-why tiledb.com Accessed: 2026-06-01

Revision #40 Last Updated: 2023-05-01 22:41