Cloud BigTable

Cloud BigTable is a distributed storage system used in Google. BigTable is designed mainly for scalability.

Google has now provided BigTable as its cloud NoSQL database service.

History

BigTable was among the early attempts Google made to manage big data. Jeffrey Dean and Sanjay Ghemawat were involved in it. It is one of the three components Google built for managing big data (the other two are Google File System and MapReduce).

These three components focus on different aspects of big data: Google File System is a reliable distributed file system that the other two build upon; MapReduce is a distributed data processing framework; BigTable is a distributed storage system.

Concurrency Control

Not Supported

BigTable only supports transactions on a single row[1]. It does not support transactions spanning multiple rows

Data Model

Column Family / Wide-Column

BigTable does not support relational data model. Instead, it provides users the ability to create column families in a table.

Each table usually contains a small number of column families, which should be rarely changed (because the change of them involves metadata change). Inside each column family, there can be unlimited number of columns. Users can freely add or delete columns in a column family. Deleting of an entire column family is also supported.

BigTable does not have any type information associated with a given column. It only treats data as strings of bytes.

BigTable uses physical logging. For performance consideration, all tablets on a tablet server write logs to the same log file[1].

Query Compilation

Not Supported

Query Interface

Custom API

BigTable provides clients with the following APIs: 1. Look Up (Read a Single Row) 2. Scan (Read a subset of rows) 3. Write 4. Delete 5. Customized Scripts (written in Sawzall language)

Storage Architecture

Disk-oriented

BigTable assumes an underlying reliable distributed file system (here is Google File System). The tablets are stored in Google File System, which is a disk-oriented file system. The most recently written records are stored in memtable, which is in memory. However, most of the data is stored on disk.

Storage Model

Custom

In BigTable, a table is split into multiple tablets, each of which is a subset of consecutive rows[1]. A tablet is a unit of data distribution and load balancing. Different tablets of a table may be assigned to different tablet servers. A tablet is stored in the form of a log-structured merge tree[2] (which they call memtable and SSTable).

Furthermore, BigTable allows clients to create locality group[3]. A locality group is a subset of columns in a table. BigTable will create a separate SSTable for each locality group, which will improve read performance of this locality group.

Stored Procedures

Not Supported

Views

Not Supported

Revision #7 | Updated 07/31/2022 10:33 p.m.

Cloud BigTable

History

Concurrency Control

Data Model

Indexes

Joins

Logging

Query Compilation

Query Interface

Storage Architecture

Storage Model

Stored Procedures

Views

Derivative Systems

Embeddings

People Also Viewed

Website

Developer

Country of Origin

Start Year

Former Name

Project Type

Supported languages

Operating Systems

Wikipedia

Derivative Systems

Embeddings

People Also Viewed