Cloud Spanner

Spanner is Google's globally distributed NewSQL database management system. It follows multi-versioning concurrency control, along with synchronous data replication. Traditional database systems lack horizontal scalability, and NoSQL systems don't provide strong consistency, and cannot be used where high level of data consistency is required. NewSQL provide the best of both worlds - excellect scalability, and ACID guarantees like a RDBMS which is performed on a single node. Spanner is the first system to distribute data at global scale and support externally-consistent distributed transactions. This combination of availability and consistency over the wide area is generally considered impossible due to the CAP Theorem. Google's globally synchronized clock - TrueTime, is essential for consistency in reads and especially for snapshots that enable consistent and repeatable analytics. Spanner connects you to the data center that is geographically closest to you for reads, and when you write data, it distributes and stores it to multiple data centers. If there is a failure at the data center you try to read from, read of the data is completed from another data center that has a replica of the data. Spanner assigns globally consistent real-time timestamps to every datum written to it, and clients can do globally consistent reads across the entire database without locking

History

Spanner originated from the numerous concerns users of Bigtable as it can be difficult to use for some kinds of applications: those that have complex, evolving schemas, or those that want strong consistency in the presence of wide-area replication. Many applications at Google had chosen to use Megastore because of its semi-relational data model and support for synchronous replication, despite its relatively poor write throughput. As a consequence, Spanner evolved from a Bigtable-like versioned key-value store into a temporal multi-versioned database

System Architecture

Shared-Nothing

Spanner has a shared-nothing architecture that provides high scalability. Data is automatically sharded replicated across multiple data centers. The application developer can choose the number of replicas and their placement. Spanner relies on GPS and atomic clocks to represent the uncertainity in time.

Logging

Physical Logging

Every write to a log involves a cross-region Paxos agreement, so the latency of two-phase commit in Spanner is at least equal to three times the latency of cross-region Paxos. This is similar to a write ahead log in a non distributed setting.

Concurrency Control

Multi-version Concurrency Control (MVCC) Two-Phase Locking (Deadlock Prevention)

Spanner uses multi-versioned concurrency control with 2-phase locking. Spanner's MVCC and 2PL implementation is unique in that it uses hardware devices (e.g., GPS, atomic clocks) for high-precision clock synchronization. The DBMS uses these clocks to assign timestamps to transactions to enforce consistent views of its multi-version database over wide-area networks. The Paxos state machines are used to implement a consistently replicated bag of mappings. The key-value mapping state of each replica is stored in its corresponding tablet. Writes must initiate the Paxos protocol at the leader; reads access state directly from the underlying tablet at any replica that is sufficiently up-to-date. At every replica that is a leader, each spanserver implements a lock table to implement concurrency control. The lock table contains the state for two-phase locking: it maps ranges of keys to lock states. Spanner provides three types of operations: read/write transaction, read transaction and snapshot read operation. A single write operation is performed through a read/write transaction, while a single read operation, not a snapshot read, is performed through a read transaction. Operations that require synchronization, such as transactional reads, acquire locks in the lock table; other operations bypass the lock table. In summary, lock-free read-only transactions, and non-blocking reads in the past are supported

Storage Model

Custom

Spanner stores rows in sorted order by primary key values, with child rows inserted between parent rows that share the same primary key prefix. This insertion of child rows between parent rows along the primary key dimension is called interleaving, and child tables are also called interleaved tables. Client applications declare the hierarchies in database schemas via the INTERLEAVE IN declarations. The table at the top of a hierarchy is a directory table. Each row in a directory table with key "test", together with all of the rows in descendant tables that start with "test" in lexicographic order, forms a directory. Primary keys in Spanner allow you to physically co-locate rows of related tables

Query Compilation

Code Generation

None of the whitepapers/Spanner paper talk about this

Checkpoints

Fuzzy

Could not find enough information to corroborate this

Storage Architecture

Disk-oriented

The universal set of Spanner is called universe. A universe consists of multiple zones. A zone means a unit that can be run with physical independence. A data center may have one or more zones. Data can be stored separately in different server groups by making two or more zones in a single data center

Views

Materialized Views

They do support it, but could not find more information about it

Isolation Levels

Snapshot Isolation

Spanner provides globally-consistent reads at a time-stamp due to TrueTime and externally-consistent distributed distributed transactions

Query Interface

SQL

SQL-based query language was chosen because of its familiarity within Google

Joins

Sort-Merge Join

Could not find exact information