Vertica is a distributed infrastructure-independent analytics platform. It can be deployed on various platforms like AWS,GCP,Azure... Vertica is designed to a achieve a high performance on OLAP compared with others especially for large workload. High availability and good scalability can be achieved as well. Also, it provides good integration with Hadoop, Spark, Kafka, which makes user select where they want to analyze their data freely.
Vetica was founded by Michael Stonebraker and Andrew Palmer in 2015. It is derived from C-Store, which is a prototype developed by MIT, Brown, and few other universities in 2016. It was acquired by Hewlett Packard in 2011 and joined Micro Focus in 2017 due to the merger between Micro Focus and HP.
Columnar store is used in Vertica to improve the performance of sequential access by sacrificing the performance of single access. Compared with row-oriented databases which scan the whole table, only few columns are retrieved based on given queries in Vertica, which can improve throughput by reducing disk I/O costs.
Decomposition Storage Model (Columnar)
Data is stored in Vertica in columnar format to improve the performance of read operations, since a lot of amount of disk I/O can be avoided.
In Vertica, each node maintains checkpoints and transaction logs separately. The synchronization duration can be tuned by users as well. For a single-node failure, it can be recovered from other nodes. If all nodes fail, it can be recovered to the earliest checkpoints when all nodes are good. New transaction log cannot be appended, if a new checkpoint starts.
Delta Encoding Run-Length Encoding
Both Run-Length Encoding and Delta encoding are used in Vertica. RLE encoding is only used when the number of repetition is large. Delta encoding works for INTEGER/DATE/TIME/TIMESTAMP/INTERVAL type, where the variations from the smallest value are stored instead of the real values.
Multi-version Concurrency Control (MVCC)
Vertica supports MVCC to achieve data consistency. Both current and previous status are stored and visible to transactions. Transaction isolations can be achieved since no conflict between the read and write operations exist. A shared-nothing MPP architecture is used in Vertica, which can avoid the overheads caused by locks.
Read Uncommitted Read Committed Serializable Repeatable Read
Read Committed and Serializable are used in Vertica. Read Committed is the default isolation level. Read Uncommitted and Repeatable Read are treated automatically as Read Committed and Serializable respectively in vertica.
Projections in Vertica have been used for query execution. Query optimizer is responsible for designing and selecting the suitable projections based on the given query plan. Various projections have different influence on query performance in terms of memory, CPU utilization, I/O, Network..
Shared-nothing architecture is used in Vertica, where all nodes don't share anything in terms of memory and disk storage. Shared-nothing architecture are easier to scale, since there is no race or contention between nodes. Moreover, Massively Parallel Processing (MPP) architecture is used in Vertica to improve the throughput of joins which requires multiple machines together.