MammothDB

MammothDB is a distributed, relational OLAP DBMS that runs on the Apache Hadoop Distributed File System. It is primarily used for business analytics and data warehousing. After its company's acquisition by MariaDB, a company that provides a DBMS product that supports analytical and transactional queries, MammothDB got killed.

History

MammothDB was founded in 2012 in Sofia, Bulgaria by Steve Keil, Alex Aldev and Angel Mitev. The mission of MammothDB was to "democratize" enterprise-level data warehousing and analytics tools by making them cheaper and more accessible. The company got acquired by MariaDB on March 27, 2018, resulting in the end of MammothDB.

Data Model

Relational

MammothDB is a relational database. Part of the reason why is to allow clients to easily expand from their initial analytics storage engines (like Microsoft Excel).

Checkpoints

Not Supported

There is no custom checkpointing for MammothDB but users can use the fault tolerant HA block storage option for the Hadoop storage nodes for safety.

Concurrency Control

Not Supported

MammothDB does not have a concurrency control protocol since it has a single worker thread per core on each of its nodes, so no two tasks ever share the same worker thread. The lack of a concurrency control protocol is primarily because it is meant for low-contention OLAP workloads and not high-contention OLTP workloads. Due to MySQL's limitation on the number of concurrent client sessions, MammothDB can serve up to 5000 concurrent client sessions.

System Architecture

Shared-Nothing

Data is loaded into tables and the tables are distributed across Hadoop storage nodes. Tables small enough to fit on each node (dimension tables) are replicated onto every node. Tables that are too large to fit on a node (fact tables) are partitioned by rows and each subset of rows is stored on a separate node, with every node getting around the same number of rows.

Storage Organization

Heaps

MammothDB stores data on each Hadoop node using block storage, with no particular order in which the entries within the blocks or the blocks are stored.

Query Interface

SQL

As of now, to LOAD DATA a SQL query or a data-loading API can be used. For now only SQL queries are supported.

Views

Virtual Views

Each query when re-written is separated into three components - one is the query that is run on each node's local data, another that is the map-reduce specification for the results from the nodes and the last one that is the final query run on MySQL. The third component will access a virtual view (stored on the pluggable storage engine) which will be specified by the first and second components and be filled after the queries on each of the nodes are aggregated by the map-reduce.

Storage Architecture

Disk-oriented

For each Hadoop storage node, when the data is distributed, it is stored primarily on disk.

Logging

Not Supported

Logging is not supported by MammothDB. However if a node fails, copies of data are stored on multiple nodes so the query could continue to run.

Stored Procedures

Not Supported

MammothDB does not support stored procedures as of September 2015. The purpose is to allow for generic-language queries in the future.