Vertica

Acquired Company Forked Code OLAP

Vertica is a distributed, shared-nothing column-store DBMS.

History

Vetica was founded by Michael Stonebraker and Andy Palmer in 2005. It was based on the C-Store prototype developed by MIT, Brown, and Brandeis. The source code was originally derived as a fork of PostreSQL.

It was acquired by Hewlett Packard in 2011. HP then sold it to MicroFocus in 2017.

Checkpoints

Blocking Fuzzy

In Vertica, each node maintains checkpoints and transaction logs separately. The synchronization duration can be tuned by users as well. For a single-node failure, it can be recovered from other nodes. If the entire cluster fails, it can be recovered up to the earliest checkpoints when all nodes are good. New transaction log cannot be appended when a new checkpoint begins.

Compression

Dictionary Encoding Delta Encoding Run-Length Encoding

Both Run-Length Encoding and Delta encoding are used in Vertica. RLE encoding is only used when the number of repetitions is large. Delta encoding works for INTEGER/DATE/TIME/TIMESTAMP/INTERVAL type, where the variations from the smallest value are stored instead of the real values to save more space. Also there are various types of dictionary encoding - distinct column values are compiled into a dictionary and then the dictionary and a list of indexes to represent the data block are stored. Finally simple LZO, GZIP and ZSTANDARD encodings are available for character datatypes.

Concurrency Control

Multi-version Concurrency Control (MVCC)

Vertica supports MVCC to achieve data consistency. Both current and previous statuses are stored and visible to transactions. There is Ancient History Mark(configurable), which represents how long history is available.

Data Model

Relational

Columnar store is used in Vertica to improve the performance of sequential access by sacrificing the performance of single access. Compared with row-oriented databases which has to scan the whole table, only few needed columns are retrieved based on given queries in Vertica, which can improve throughput by reducing disk I/O costs.

Foreign Keys

Supported

Vertica allows users to use foreign key constraints. Foreign keys should be defined when tables are created or "ALTER TABLE" is used. Constraints are disabled by default except CHECK constraints, so they are not evaluated synchronously, can be analyze using SELECT ANALYZE_CONSTRAINTS() statement asynchronously. They can be enabled by keyword ENABLE, so they are evaluated synchronously.

Indexes

Not Supported

Indexes are not support in Vertica. Projections are used to improve query performance in Vertica. There can be more projections on top of single table. Sort order and segmentation can be specified to define by which columns data are presorted and distributed in database cluster. If query joins or aggregates data and underlying projections are presorted / segmented in compatible way, Vertica can use optimized operators, so much less memory is used and performance is significantly better. Projections can also be "UNSEGMENTED", so all table data are stored on all nodes in cluster. It is obviously recommended for smaller tables.

Isolation Levels

Read Committed Serializable

Read Committed and Serializable are supported in Vertica. Read Committed is the default isolation level. Read Uncommitted and Repeatable Read are treated automatically as Read Committed and Serializable respectively in vertica.

Joins

Hash Join Sort-Merge Join

Both merge join and hash join are supported in Vertica. Merge join is faster in general and requires less memory, but data is required to be sorted before. Hash join requires more memory, but it is faster if the inner table can fit in the memory. In multi-node cluster, if join key is not compatible with segmentation of underlying projections, RESEGMENT / BROADCAST operators must be executed and performance significantly degrades. UNSEGMENTED projections on small tables are recommended to prevent RESEGMENT / BROADCAST.

Vertica supports query via SQL and its custom API (GeoSpatial, Timeseries, Event Series Joins, Machine Learning, ...). Vertica also provides connectors for external services, such as Hadoop, Spark, Kafka, S3, Google Cloud Storage. Moreover, Vertica also supports C++, Java, Python, R SDK.

Storage Architecture

Disk-oriented

Enterprise (legacy) mode supports both Read and Write Optimized Storage (WOS, ROS). WOS is in memory, optimized for fast writes, ROS is on disk, encoded/compressed, for fast reads. EON mode stops supporting WOS. Persistent storage is separated (S3 like), but there is local cache called "Depot" (LRU).

Storage Model

Decomposition Storage Model (Columnar)

Data is stored in Vertica in columnar format to improve query performance, since a lot of disk I/O can be avoided.

Stored Procedures

Not Supported

Stored Procedures are not support in Vertica. External Procedures such as R,C++ can be used in Vertica.

System Architecture

Shared-Nothing

Shared-nothing architecture is used in Vertica, where all nodes don't share anything in terms of memory and disk storage. Shared-nothing architecture are easier to scale, since there is no race or contention caused by locks. Moreover, Massively MPP(Massive Parallel Processing) architecture is used in Vertica, which can improve query performance such as increasing the throughput of large joins when multiple machines are involved. Newly EON mode has been developed to separate compute from storage, so they can be scaled independently. S3 like storage can be used.

Views

Materialized Views

The projections in Vertica are similar to materialized view in other DBMS. Various projections can be created on the same table so that some optimizations such as sorting data can be done for some specific queries in advance. Moreover there is support for pre-aggregated projections.

Revision #22 | Updated 06/16/2024 10:33 p.m.

Vertica

History

Checkpoints

Compression

Concurrency Control

Data Model

Foreign Keys

Indexes

Isolation Levels

Joins

Logging

Query Compilation

Query Interface

Storage Architecture

Storage Model

Stored Procedures

System Architecture

Views

People Also Viewed

Website

Tech Docs

Twitter

Developer

Country of Origin

Start Year

Acquired By

Project Type

Written in

Supported languages

Derived From

Inspired By

Operating Systems

Licenses

Wikipedia

People Also Viewed