C-Store

C-Store is a column-oriented DBMS designed for read-optimized OLAP workloads. It adopts a column store architecture, explores various DSM compression schemes and corresponding query optimization strategies, stores data in overlapping collections of projections for both performance and availability, and employs other optimizations specific to column store. (Please delete anything in parenthesis as it is used to point out ambiguity)

History

C-Store is an academic project led by Michael Stonebraker and Daniel Abadi, involving people from Brown University, Brandeis University, MIT and the University of Massachusetts Boston. It was later commercialized into Vertica.

Concurrency Control

Two-Phase Locking (Deadlock Detection)

The system maintains a distributed lock table. Deadlock is resolved via timeouts by aborting one of the deadlocked transactions. (I think this means deadlock detection?)

Query Interface

SQL

Logically, users interact with C-Store in SQL, with standard SQL semantics.

Views

Materialized Views

(found in their source code 'write store materialized view')

Logging

Logical Logging

"We use logical logging (as in ARIES), since physical logging would result in many log records, due to the nature of the data structures in WS." (But I believe ARIES log is physical logging? So confused ...)

Joins

Nested Loop Join Hash Join Sort-Merge Join

(found in their source code)

Storage Model

Decomposition Storage Model (Columnar)

As the name suggests, C-Store is all about column store ... Interestingly, both the read-optimized store component and the update/insert-oriented writable store component adopt the column store architecture.

Isolation Levels

Snapshot Isolation

(The paper talks in detail about their support for snapshot isolation, but does not mention if they support other isolation level ...)

Stored Procedures

Not Supported

(I don't think they mention it, so I guess it's a no?)

Indexes

B+Tree BitMap

Despite the different possible encoding schemes of a column (e.g. RLE, bit-map encoding, or block-oriented delta encoding), they all use B-tree indexes. The system also stores join indices to stitch together all records in a table from its different columns (projections). Since a column which is ordered by another column in the same projection and contains few distinct values is encoded using bit-map encoding plus RLE, the paper also mentioned their extensive use of bitmap indexes.

Storage Architecture

Disk-oriented

Each column is stored as a separate file containing a list of 64K blocks, each packing as many values as possible.

Data Model

Relational

Logically, C-Store supports the standard relational data model, where a database contains a collection of tables and a table contains a collection of attributes.

System Architecture

Shared-Nothing

The architecture was designed anticipating an environment of grid computers, containing large number of nodes each with private disk and memory. The data is horizontally partitioned across the disks of the nodes.

Query Compilation

Not Supported

Website

http://db.csail.mit.edu/projects/cstore/

Developer

Samuel Madden, Michael Stonebraker, Daniel Abadi, Stan Zdonik, Mitch Cherniack, David DeWitt, Pat O'Neil, Betty O'Neil, Nga Tran, Tien Hoang, Alexander Rasin, Tingjian Ge, Xuedong Chen, Stavros Harizopoulos, Miguel Ferreira, Amersin Lin, Adam Batkin, Edmond Lau

End Year

2006

Project Type

Academic

Supported languages

C++

Operating Systems

Linux

Licenses

BSD