Yellowbrick is a relational data warehouse that is optimized for flash-based storage.
Multi-version Concurrency Control (MVCC)
Yellowbrick uses append-only MVCC with vacuum garbage collection.
Yellowbrick supports the boolean, integer, decimal, floating point, string, date/time, and UUID types available in PostgreSQL, as well as new data types for IP and MAC addresses.
Yellowbrick’s on-premise servers utilize a dual-core FPGA to accelerate table scans by performing file parsing, decompression, predicate evaluation, and Bloom filtering. The FPGA accelerator is also used for shuffling data between nodes, which happens via RDMA.
Yellowbrick does not support indexes.
Yellowbrick universally uses the
READ COMMITTED isolation level.
Nested Loop Join Hash Join Sort-Merge Join
Yellowbrick supports hash, sort-merge, and nested loop joins.
Yellowbrick uses intra-operator parallelism, where each thread operates on a different chunk of data, and threads are synchronized to each execute the same operators simultaneously. Yellowbrick schedules execution operators that process a given packet of data to be as close to each other as possible to minimize data movement.
Yellowbrick partitions query plans into segments and converts them into C++ code. Segments are then compiled into machine code in parallel using a modified version of LLVM which is memory-resident with its ASTs pre-loaded. Compiled object files are cached and reused.
Yellowbrick also has a specialized pattern compiler for
SIMILAR TO, regular expressions, and date/time parsing. Yellowbrick generates finite state machines for these patterns and compiles them to machine code using LLVM.
Unlike systems which constrain their query plans to be trees, Yellowbrick uses graph query plans, which allow for execution nodes to have more than one consumer. The execution engine operates on a push-based model, passing cache-resident buffers between operators. Yellowbrick uses AVX SIMD instructions to evaluate expressions and predicate filters.
Yellowbrick is compatible with the PostgreSQL dialect and wire protocol, and it uses the PostgreSQL JDBC, ODBC, and ADO.NET drivers.
Yellowbrick’s on-premise servers persist data in NVMe SSDs using a custom file system called BBFS (Big Block File System). BBFS is fully asynchronous and stores file system metadata in in-memory indexes. On top of BBFS sits ParityFS, a cluster-level file system which implements n+2 erasure encoding at the file level, allowing it to tolerate up to two concurrent failures. If a node fails, its files are virtually reassigned to the remaining nodes, which lazily reconstruct the files when they are read.
In the cloud version of Yellowbrick, data is persisted to an object store such as Amazon S3. Worker nodes use custom object store connectors built on a custom HTTP stack that runs asynchronously in userspace. Local SSDs operate as write-around caches: when a node writes to a shard, it writes directly to the object store, notifying other workers. Caches use a modified LRU eviction policy with scan resistance, and shards are assigned to workers using rendezvous hashing.
Yellowbrick natively supports reading from Apache Parquet and CSV files.
Yellowbrick uses a column store as the main storage format, with tables horizontally partitioned into 200MB shards.
Yellowbrick also has a log-structured row store for streaming ingestion and small inserts which is periodically flushed to the column store. High-throughput streams, bulk loads, and
INSERT INTO ... SELECT operations are committed directly to the column store.
Yellowbrick supports PL/pgSQL stored procedures (
CREATE PROCEDURE) but not user-defined functions (
CREATE FUNCTION). Unlike in PostgreSQL, stored procedures in Yellowbrick can return values and be called from
SELECT statements, but only when there is no table-referencing
Triggers are not supported.
Yellowbrick employs a shared-nothing microservice architecture managed by Kubernetes. Each Yellowbrick cluster is comprised of elastically scaling worker, compiler, and bulk load pods, which are supported by pods for management, logging, and the user interface.
Yellowbrick supports virtual views only.