BlazingSQL

BlazingSQL is a distributed GPU-accelerated SQL engine with data-lake integration for e.g. Apache Arrow, Apache Parquet. It is ACID-compliant. BlazingSQL targets ETL workloads and aims to perform efficient read IO and OLAP querying. BlazingDB refers to the company and BlazingSQL refers to the product. It is currently under active development with 15 employees that have offices in San Franscisco and Peru.

History

BlazingSQL started as a GPU table joiner for multi-terabyte databases. The Aramburu brothers, Rodrigo and Felipe, founded a company in 2013 that provided analytical solutions and needed to speed up joins for pension fraud detection. The system is closed-source with a free community binary. It integrates with the open-source open GPU data science initiative, RAPIDS, which relies on NVIDIA GPUs.

Joins

Hash Join

BlazingSQL supports hash joins, e.g. on strings. It is not clear what other join types are supported.

Compression

Dictionary Encoding Delta Encoding Run-Length Encoding Bit Packing / Mostly Encoding

BlazingSQL supports compressing and decompressing directly on the GPU. It accepts a variety of input formats such as Apache Parquet, BlazingDB Simpatico (GPU-compressed distributed files), and GDF (GPU dataframes built on Apache Arrow). Data is then sent to the GPU compressed. It is able to operate directly on compressed data.

Isolation Levels

Snapshot Isolation

BlazingSQL supports Snapshot Isolation, it is unclear if other options are supported.

Logging

Physical Logging

When importing data, BlazingSQL always writes it to disk, compresses it and has it in a query-ready state.

Storage Architecture

Hybrid

BlazingSQL loads data to disk, but ultimately operates on the data in GPU.

Hardware Acceleration

GPU

BlazingSQL is hardware-accelerated with NVIDIA GPUs. Relevant columnar data is compressed, cached and sent to the GPU. The GPUs are used to speed up transforms, predicates, running predicates while skipping metadata, and to perform accelerated joins.

Query Interface

SQL

BlazingSQL exposes a Python connector for executing SQL commands.

System Architecture

Shared-Nothing

BlazingSQL worker nodes push information to each other whenever required. There is a notion of a distributed cache, and nodes can ask each other for cached data-lake data.

Views

Virtual Views

BlazingSQL 1.3 supported the CREATE VIEW command. It is unclear if the views are virtual or materialized.

Data Model

Relational

BlazingSQL is a relational database. It accepts multiple in-memory formats (e.g. Apache Parquet) and provides a SQL interface for querying the data.

Stored Procedures

Not Supported

As of BlazingSQL 1.3, stored procedures do not appear to be supported.

Query Execution

Vectorized Model

BlazingSQL operations are vectorized on the GPU (SIMD).

Query Compilation

Not Supported

BlazingSQL does not appear to currently do query compilation.

Storage Model

Decomposition Storage Model (Columnar)

BlazingSQL is a column-store. To execute a query, it compresses and transmits relevant columns to the GPU. On the GPU, data is represented as a GPU DataFrame (GDF). GDFs are built on top of Apache Arrow, which is a columnar in-memory format.

Concurrency Control

Multi-version Concurrency Control (MVCC)

BlazingSQL supports snapshot isolation, which is most likely achieved with MVCC.

Indexes

Not Supported

BlazingSQL does not appear to support indexes.

Storage Organization

Log-structured

BlazingSQL appears to be log-structured.

BlazingSQL Logo
Website

https://blazingdb.com/

Tech Docs

https://docs.blazingdb.com/

Developer

BlazingDB

Country of Origin

PE

Start Year

2015

Project Type

Commercial

Written in

C++

Supported languages

SQL

Operating Systems

Linux

Licenses

Proprietary