Velox

View Current Viewing Revision #3 from 05/01/2023 7:27 p.m.

OLAP

Velox is a reusable vectorized database execution engine. It can be used to build compute engines focused on analytical workloads, including batch (Spark, Presto), interactive (PyVelox), stream, log processing, and AI/ML.

Unlike a complete database, Velox cannot be used directly by end-users. Rather, it is designed to be a general-purpose component to handle execution that database developers can use in their systems.

History

Meta's data infrastructure contains dozens of specialized data computation engines, which have been largely developed independently. Maintaining and enhancing each of them can be difficult, especially considering the rapid change of workload requirements and hardware condition.

Velox is created in 2020 and open-sourced in 2021 to address this problem as a unified execution engine. It is under active development, but it’s already in various stages of integration with some systems, including Presto, Spark, and PyTorch (the latter through a data preprocessing library called TorchArrow), etc. Additional contributions were provided by Intel, ByteDance, and Ahana.