AresDB

AresDB is a GPU-based real-time analytics database with low memory overhead, real-time upserts with primary key deduplication, and time series aggregations on both streaming and finite dimensional data, including geofences.

History

Uber began to develop AresDB to replace Elasticsearch as their analytical database, as Elasticsearch used inverted indexes that weren't optimized for Uber's "time range-based storage and filtering," had a lot of unnecessary overhead due to using JSON files for storage, and was JVM-based, meaning it "[did] not support joins and its query execution runs at a higher memory cost." Uber decided to accelerate AresDB with GPUs because they expect GPUs' higher core count, 'greater computational throughput", and "greater compute-to-storage (ALU to GPU global memory) data access throughput (not latency) compared to [CPUs]," will further speed up their analytical queries.

Query Interface

Custom API

AresDB uses a proprietary execution language called Ares Query Language (AQL) which is based in the JSON format, making it compatible with any language that can handle files and/or JSON.

Views

Materialized Views

AresDB uses late materialization for its joins, meaning that it may only physically execute the join once a foreign key is accessed.

Logging

Logical Logging

Log files contain description of database upserts which must be replayed to rebuild the database after a crash.

Joins

Hash Join

AresDB supports hash joins from fact tables (finite set data such as cities) to dimension tables (infinite streaming data such as rides). The database also supports geospatial joins (i.e. geographically bounded area overlap) and normal foreign key joins. Note that AresDB uses late materialization for its joins, meaning the join may not be executed until a foreign key is accessed.

Query Execution

Vectorized Model

AresDB works with vector batches that are efficiently processed in parallel using the Thrust library.

Stored Procedures

Not Supported

Indexes

Hash Table

AresDB uses Hash Tables primarily for primary key deduplication.

Storage Architecture

Hybrid

Data within the archival delay of a table is kept uncompressed in live batches, while everything else is stored in compressed archival batches. If new data is ingested that is outside the archival array, it's added to an archival backfill queue which will be inserted into the archived batches asynchronously.

Checkpoints

Fuzzy

Snapshots are triggered by either a certain number of mutations or a certain time frame specific to each table.

Foreign Keys

Supported

AresDB supports foreign key joins.

Data Model

Relational

System Architecture

Shared-Disk

The CPU is only used to load information from storage into CPU memory and to distribute this data to GPU memory. The database system delegates each operator in a query to some GPU, so it's able to handle multiple GPUs by delegating different operations to different GPUs, each of which have completely separate memory. There are plans to implement proper distributed designs, but currently we're limited to a single system with multiple GPUs.

Query Compilation

Not Supported

Compression

Run-Length Encoding

AresDB only compresses data with user defined sort orders that have low cardinality.

Hardware Acceleration

GPU

AresDB uses GPUs for its query execution.

Storage Organization

Sorted Files

Archived data is sorted in a user specified column order, and files are organized by UTC day and Unix time cutoffs.

Parallel Execution

Intra-Operator (Horizontal)

Executes queries with the one operation per kernel (OOPK) model.

AresDB Logo
Website

https://eng.uber.com/aresdb/

Source Code

https://github.com/uber/aresdb

Tech Docs

https://github.com/uber/aresdb/wiki

Developer

Uber

Country of Origin

US

Start Year

2018

Project Type

Open Source

Written in

C, C++, Go

Inspired By

Elasticsearch, Kinetica, Ocelot, OmniSci, Pinot

Operating Systems

Linux

Licenses

Apache v2