PosDB is a distributed disk-based column-store that leverages late materialization idea to deal with complex analytical workloads. Posdb stands for "Positional database", which refers to its unique indexing technique. It uses the positional indexing technique to efficiently store and retrieve data. PosDB is a columnar DBMS with SQL support designed for analytic workloads. Unlike classic columnar systems, query execution plans in column stores can include operators that work not only with values but also with positions. In this way, PosDB features fluent data-position handling and produces two types of intermediate representations that can be used during query processing. This leads to unconventional query plans that conserve disk bandwidth and enable sophisticated query processing techniques. It also features other natural benefits of column-stores, such as efficient column-level compression.
The PosDB project started in 2017 in Russia as a research effort to study late materialization-oriented query processing. It was developed by the collaboration of St. Petersburg State University and JetBrains Research. The developers focused on query processing issues not studied in PosDB’s predecessors, namely: LM and processing of aggregation queries (including window function processing), distributed processing in LM environments, subqueries and LM, etc.
PosDB focuses on read-only workloads with bulk loading of data. Therefore, there are no checkpoints for transactions now. As for reliability, PosDB has some homebrew mechanisms for handling network failures, including reestablishing the connection with the node and continuing executing the query. In the case of a network reader, the client sends to the server the replica identifier and the current block of positions. If the client reconnects, then a new thread that takes on this load and continues its processing is created. To be more specific, once the connection has been reestablished, the interaction will proceed from where it was interrupted, if possible. If not, the query plan will be sent again and a required amount of data will be skipped. In the future, the developers plan to implement other strategies, including choosing another available node.
Yes, PosDB supports database compression and the pages are decompressed as soon as they arrive from the disk. Moreover, it uses a generalized I/O-RAM approach in which data is stored on disk as a collection of compressed pages and decompressed as soon as they are loaded into the buffer manager. In PosDB compression can be applied on a per-column basis, with only corresponding column files and catalog files being modified. In other words, there are no other changes from the users' point of view that will occur. As for compression algorithms, PosDB has implemented Brotli, PFOR, VByte, SIMD-FastPFOR128, and SIMD-BinaryPacking128, and conducted experiments to obtain top-performing algorithms in terms of decoding speed. The above algorithms can be divided into two categories: Brotli is a heavy-weight algorithm while the other four are light-weight. Compared to light-weight algorithms, heavy-weight compression schemes require significant effort to perform decompression while offering significantly higher compression ratios. Among the four light-weight compression algorithms, SIMD-FastPFOR128, and SIMD-BinaryPacking128 are SIMD-enabled.
Nested Loop Join Hash Join Sort-Merge Join Broadcast Join Shuffle Join Semi Join
PosDB supports both local and distributed joins with arbitrary partitioning and replication schemas. The system supports hash, nested-loop, and sort-merge algorithms.
Intra-Operator (Horizontal) Inter-Operator (Vertical)
PosDB can store data on several machines, as well as process each query using several threads and a set of machines (inter- and intra-query parallelism).
Using the Volcano pull-based execution model, PosDB represents each query by a tree of operators and each operator supports an iterator interface and can produce either positions or tuples exclusively. As for intra-query parallelism, PosDB supports a plan-level parallelism and implemented an n-ary node which processes its subtrees in separate threads and merges their results into one stream in arbitrary order. There are two operators for supporting intraquery parallelism: Asynchronizer and UnionAll. Asynchronizer is a unary operator that only executes the child operator in a thread, while UnionAll is a polyadic operator that executes all child operators and then merges the results. Moreover, since 2017, the developers have been exploring various aspects of LM-oriented processing including aggregation, window functions, compression, intermediate result caching, distributed processing.
Using the Volcano execution model, PosDB represents each query by a tree of operators that support the iterator interface. Other than iterators, PosDB also supports vectorized processing. PosDB supports parallel operators including SyncReader. PosDB has two operators for supporting intraquery parallelism: Asynchronizer and UnionAll. Asynchronizer is a unary operator that only executes the child operator in a thread while UnionAll is a polyadic operator that executes all child operators and then merges the results.
On the other hand, tuple-based operators target aggregation, which needs multiple operations for each wide row. The first (lowest) of these operators is a materialization point: positional data is transformed into tuples. The transformation is sometimes coupled with grouping and window functions to reduce the amount of materialized data.
PosDB supports inter- and intra- query parallelism. To implement the latter PosDB uses two special operators. Asynchronizer allows it to execute a single operator tree in a separate thread and UnionAll is used to collect data from several subtrees that are executed in their own threads.
PosDB is a natively distributed DBMS in terms of both data and query execution. Each table may be fragmented and replicated across multiple nodes. A number of table-level fragmentation strategies are supported: round-robin, hash and range partitioning. Distributed query execution allows PosDB to run a query on multiple nodes, with each node processing an arbitrary part of the query plan. Both positional and tuple operators can be executed on arbitrary nodes, regardless of where their children reside. Also, several operators support internal distribution embedded into their core algorithms, like distributed join and aggregation.
PosDB supports a subset of SQL which consists of selection, projection, join, ordering, aggregation, and window functions (a naive rule-based optimizer is run). It does not support nested queries, updates, deletion, and data definition language (DDL).
It also provides a C++ interface that allows manual query plan construction with exact details of used algorithms and distributed schema.
Since PosDB is a disk-based system all data resides on disk. Disk subsystem efficiency is provided via columnar storage and a buffer manager. The former reduces disk load and helps in homogenizing processed data. A buffer manager allows important data to stay in memory for longer, including the case of several concurrently running queries.
Indexed Sequential Access Method (ISAM)
The data is stored on disk and the system has a storage manager which manages the data and provides access to them. Each replica is stored in a separate file without paging and the buffer manager can help with efficient query execution.
PosDB supports materialized views and implements several approaches to materialization including early materialization, late materialization and hybrid materialization. By enabling the transfer of both data and locations between operators and reverting to late materialization just after the Join operator, hybrid materialization helps to decrease the network communication overhead that occurs in the distributed situation during the processing of position-based queries. PosDB has also proposed ultra-late materialization that combines late materialization in selections and joins. Since PosDB needs to operate on positions, ultra-late materialization makes possible operating on positions and thus deferring tuple reconstruction for as long as possible.
https://pos-db.com/#Publications
PosDB team (St. Petersburg State University, JetBrains Research)
2017