Vertica is a distributed infrastructure-independent analytics platform. It can be deployed on various platforms like AWS,GCP,Azure... It is designed to support a relatively high query performance compared with traditional DBMS. High availability and good scalability on commodity hardware can be achieved. Also, it supports good integration with Hadoop, Spark, Kafka, which makes user choose where they want to analyze data.
Vetica was founded by Michael Stonebraker and Andrew Palmer in 2015. Vertica is derived from C-Store. C-Store is a prototype developed by MIT and few other universities like Brown. It was acquired by Hewlett Packard in 2011.Moreover, it also joined Micro Focus in 2017 due to the merger between Micro Focus and HP.
Delta Encoding Run-Length Encoding
Both Run-Length Encoding and Delta encoding are used in Vertica. RLE encoding is only used when the length is large, and it is processed by the execution engine run by run. Data encoding works for INTEGER/DATE/TIME/TIMESTAMP/INTERVAL, where the difference from the smallest value are used as the data.
Multi-version Concurrency Control (MVCC)
Vertica supports Multi-version Concurrency Control for data consistency. Apart from current status, previous status are also visible to transactions, Transaction isolations can be achieved here since there is no conflict between the read and write operations. A shared-nothing parallel processing architecture has been adopted in Vertica, which can prevent the overhead from locks.
Columnar store is used in Vertica to improve the performance of sequential access, even if the performance of single record have to be degraded. Compared with row-oriented databases which scan the whole table, only few columns are retrieved for given query, which can improve throughput by reducing I/O operations.
In Vertica, each node maintains checkpoints and transaction logs separately. The synchronization duration can be tuned by users. For a single-node failure, it is recovered from other nodes. If all nodes face failures, the database is recovered to the earliest checkpoints where all nodes are good. No new transaction log will be appended, if a new checkpoint starts in Vertica.
Decomposition Storage Model (Columnar)
Data is stored in Vertica in column format to improve the performance of read operations, since a lot of amount of disk I/O can be prevented.