Tajo

Apache Tajo is an open-source big data relational and distributed data warehouse system that provides fault-tolerant analytical processing on large-scale datasets. It is compatible with Apache Hadoop and HDFS and supports SQL standards including complex queries, joins, and aggregations.

Apache Tajo is designed to be scalable and can process massive data sets with tens of thousands of nodes. It supports various file formats, including CSV, TSV, ORC, and Parquet. The Tajo constructs a master-slave cluster with master nodes and worker nodes. The master nodes manage the cluster and coordinate query execution, while the worker nodes perform the actual data processing.

Tajo also supports user-defined functions (UDFs), which allow users to extend the functionality of Tajo with their custom logic. Additionally, Tajo includes a web-based user interface and a command-line interface for managing and querying data.

History

The project was marked as abandoned and deprecated to the Apache Foundation "attic" in 2020-09.

Data Model

Relational

Storage Architecture

Disk-oriented

System Architecture

Shared-Disk

Tajo Logo
Website

http://tajo.apache.org/

Source Code

https://git-wip-us.apache.org/repos/asf?p=tajo.git

Tech Docs

http://tajo.apache.org/docs/current/

Twitter

@ApacheTajo

Developer

Korea University

Country of Origin

KR

Start Year

2012

End Year

2020

Project Type

Open Source

Licenses

Apache v2