MammothDB is a distributed, relational OLAP DBMS that runs on the Apache Hadoop Distributed File System. It is primarily used for business analytics and data warehousing. After its company's acquisition by MariaDB, a company that provides a DBMS product that supports analytical and transactional queries, MammothDB got killed.
MammothDB was founded in 2012 in Sofia, Bulgaria by Steve Keil, Alex Aldev and Angel Mitev. The mission of MammothDB was to "democratize" enterprise-level data warehousing and analytics tools by making them cheaper and more accessible. The company got acquired by MariaDB on March 27, 2018, resulting in the end of MammothDB.
As of now, to LOAD DATA a SQL query or a data-loading API can be used. For now only SQL queries are supported.
Each query when re-written is separated into three components - one is the query that is run on each node's local data, another that is the map-reduce specification for the results from the nodes and the last one that is the final query run on MySQL. The third component will access a virtual view (stored on the pluggable storage engine) which will be specified by the first and second components and be filled after the queries on each of the nodes are aggregated by the map-reduce.
MammothDB does not have a concurrency control protocol since it has a single worker thread per core on each of its nodes, so no two tasks ever share the same worker thread. The lack of a concurrency control protocol is primarily because it is meant for low-contention OLAP workloads and not high-contention OLTP workloads. Due to MySQL's limitation on the number of concurrent client sessions, MammothDB can serve up to 5000 concurrent client sessions.
Logging is not supported by MammothDB. However if a node fails, copies of data are stored on multiple nodes so the query could continue to run.
MammothDB stores the data on each of its Hadoop storage nodes using a columnar data-store.
MammothDB does not support stored procedures as of September 2015. The purpose is to allow for generic-language queries in the future.
MammothDB does not support indexes since they result in an additional storage and computation overhead.
For each Hadoop storage node, when the data is distributed, it is stored primarily on disk.
There is no custom checkpointing for MammothDB but users can use the fault tolerant HA block storage option for the Hadoop storage nodes for safety.
MammothDB is a relational database. Part of the reason why is to allow clients to easily expand from their initial analytics storage engines (like Microsoft Excel).
Data is loaded into tables and the tables are distributed across Hadoop storage nodes. Dimension tables are replicated onto every node and are small enough to fit on a node. Fact tables are partitioned by rows and each subset of rows is stored on a separate node, with every node getting around the same number of rows.
MammothDB stores data on each Hadoop node using block storage, with no particular order in which the entries within the blocks or the blocks are stored.