Vertica

View Current Viewing Revision #11 from 12/08/2018 12:19 p.m.

Vertica is a distributed infrastructure-independent analytics platform. It can be deployed on various platforms like AWS,GCP,Azure... It is designed to support a high OLAP performance compared with others. High availability and good scalability can be achieved as well. Also, it provides good integration with Hadoop, Spark, Kafka, which makes user select where they want to analyze their data freely.

History

Vetica was founded by Michael Stonebraker and Andrew Palmer in 2015. It is derived from C-Store, which is a prototype developed by MIT, Brown, and few other universities. It was acquired by Hewlett Packard in 2011 and joined Micro Focus in 2017 due to the merger between Micro Focus and HP.

System Architecture

Shared-Nothing

Shared-nothing architecture is used in Vertica, where all nodes don't share anything in terms of memory and disk storage. Shared-nothing architecture are easier to scale, since there is no race or contention between nodes. Moreover, massively Parallel Processing (MPP) architecture is used in Vertica to improve the throughput of joins requiring multiple machines together.

Storage Architecture

Hybrid

Hybrid data store are supported in Vertica. Write Optimized Store(WOS) is about storing data in memory, which does not support compression and indexing. Read Optimized Store(ROS) is about storing data on disk, where data is sorted and segmented.

Query Interface

Custom API SQL

Data Model

Column Family / Wide-Column

Columnar store is used in Vertica to improve the performance of sequential access by sacrificing the performance of single record. Compared with row-oriented databases which scan the whole table, only few columns are retrieved based on given queries in Vertica, which can improve throughput by reducing disk I/O costs.

Concurrency Control

Multi-version Concurrency Control (MVCC)

Vertica supports MVCC to achieve data consistency. Both current and previous status are stored and visible to transactions. Transaction isolations can be achieved since no conflict between the read and write operations exist. A shared-nothing MPP architecture is used in Vertica, which can prevent the overheads caused by locks.

Query Execution

Materialized Model

Projections in Vertica have been used for query execution. Query optimizer is responsible for designing and selecting the suitable projections based on the given query plan. Various plans have different influence on query performance in terms of memory, CPU utilization, I/O, Network..

Logging

Physical Logging

Checkpoints

Blocking Fuzzy

In Vertica, each node maintains checkpoints and transaction logs separately. The synchronization duration can be tuned by users as well. For a single-node failure, it can be from other nodes. If all nodes fail, it can be recovered to the earliest checkpoints when all nodes are good. new transaction log cannot be appended, if a new checkpoint starts.

Compression

Delta Encoding Run-Length Encoding

Both Run-Length Encoding and Delta encoding are used in Vertica. RLE encoding is only used when the number of repetition is large. Delta encoding works for INTEGER/DATE/TIME/TIMESTAMP/INTERVAL type, where the variations from the smallest value are stored instead of the real values.

Stored Procedures

Not Supported

Stored Procedures are not support in Vertica. External Procedures such as R and Bash scripts can be used.

Isolation Levels

Read Uncommitted Read Committed Serializable Repeatable Read

Read Committed and Serializable are used in Vertica. Read Committed is the default isolation level. Read Uncommitted and Repeatable Read are treated automatically as Read Committed and Serializable respectively.

Foreign Keys

Supported

Vertica allows users to use foreign key constraints. Foreign keys should be defined when tables are created or using "ALTER TABLE".

Views

Materialized Views

The projections in Vertica are similar to materialized view in other databases. Projections can be created on the same table so that some optimizations such as pre-storing required data can be done for some specific queries.

Indexes

Not Supported

Indexes are not support in Vertica. Projections are used to improve query performance in Vertica.

Storage Model

Decomposition Storage Model (Columnar)

Data is stored in Vertica in column format to improve the performance of read operations, since a lot of amount of disk I/O can be prevented.

Joins

Hash Join Sort-Merge Join

Both merge join and hash join are supported in Vertica. Merge Join is faster in general and requires less memory, but pre-sorted data is required.Hash join requires more memory, but it is faster if the inner table can fit in the memory.

Query Compilation

Not Supported

Revision #11 | Updated 12/08/2018 12:19 p.m.

View Current Viewing Revision #11 from 12/08/2018 12:19 p.m.

Website

http://www.vertica.com/

Developer

Andrew Palmer and Michael Stonebraker

Country of Origin

Start Year

2005

Acquired By

Vertica

History

System Architecture

Storage Architecture

Query Interface

Data Model

Concurrency Control

Query Execution

Logging

Checkpoints

Compression

Stored Procedures

Isolation Levels

Foreign Keys

Views

Indexes

Storage Model

Joins

Query Compilation

Website

Developer

Country of Origin

Start Year

Acquired By

Project Type

Written in

Supported languages

Derived From

Inspired By

Operating Systems

Licenses

Wikipedia