Tajo

OLAP

Apache Tajo was developed inspired by Google's paper "Dremel: Interactive Analysis of Web-Scale Datasets". Dremel is a system that provides a distributed column-oriented storage and column-oriented SQL query engine used to process large amounts of data, and Tajo likewise provides a column-oriented data store and SQL query engine.

Apache Tajo is an open-source big data relational and distributed data warehouse system that provides fault-tolerant analytical processing on large-scale datasets. It is compatible with Apache Hadoop and HDFS and supports SQL standards including complex queries, joins, and aggregations.

Apache Tajo is designed to be scalable and can process massive data sets with tens of thousands of nodes. It supports various file formats, including CSV, TSV, ORC, and Parquet. The Tajo constructs a master-slave cluster with master nodes and worker nodes. The master nodes manage the cluster and coordinate query execution, while the worker nodes perform the actual data processing.

Tajo also supports user-defined functions (UDFs), which allow users to extend the functionality of Tajo with their custom logic. Additionally, Tajo includes a web-based user interface and a command-line interface for managing and querying data. For optimizations, Tajo provides a cost-based optimization model and an expandable rewrite rule. A commercial solution with similar functionality is Cloudera's Impala.

History

2012: Started by Hyunsik Choi and Jihoon Son as a project of Korea University's DB Lab.

2013-03: Developers from Gruter, Korea University, LinkedIn, Nasa, HortonWorks, and Intel participated and adopted it as an incubation project of the Apache Foundation.

2014-03: Became Apache Top-Level Project (TLP)

2019-12: Released latest stable version (Tajo 0.12.0)

2020-09: The project was marked as abandoned and deprecated to the Apache Foundation "attic".

Data Model

Relational

Storage Architecture

Disk-oriented

System Architecture

Shared-Disk

Tajo Logo
Website

http://tajo.apache.org/

Source Code

https://git-wip-us.apache.org/repos/asf?p=tajo.git

Tech Docs

http://tajo.apache.org/docs/current/

Twitter

@ApacheTajo

Developer

Hyunsik Choi and Jihoon Son, who was a member of Korea University's DB Laboratory

Country of Origin

KR

Start Year

2012

End Year

2020

Acquired By

Apache

Project Type

Open Source

Written in

Java

Supported languages

Bash, Java, Perl, Python, Ruby

Compatible With

Hive

Operating Systems

All OS with Java VM

Licenses

Apache v2