Spanner is Google's globally distributed NewSQL database management system. It follows multi-versioning concurrency control, along with synchronous data replication. Traditional database systems lack horizontal scalability, and NoSQL systems don't provide strong consistency, and cannot be used where high level of data consistency is required. NewSQL provide the best of both worlds - excellect scalability, and ACID guarantees like a RDBMS which is performed on a single node. Spanner is the first system to distribute data at global scale and support externally-consistent distributed transactions. This combination of availability and consistency over the wide area is generally considered impossible due to the CAP Theorem. Google's globally synchronized clock - TrueTime, is essential for consistency in reads and especially for snapshots that enable consistent and repeatable analytics. Spanner connects you to the data center that is geographically closest to you for reads, and when you write data, it distributes and stores it to multiple data centers. If there is a failure at the data center you try to read from, read of the data is completed from another data center that has a replica of the data. Spanner assigns globally consistent real-time timestamps to every datum written to it, and clients can do globally consistent reads across the entire database without locking[03][04]
- Developer
- Country of Origin
- US
- Start Year
- 2007 [16]
- Former Name
- Spanner
- Project Type
- Commercial
- Supported Languages
- Go, Java, JavaScript, Python
- Derived From
- Cloud BigTable
- Operating System
- Hosted
- License
- Proprietary
Spanner is Google's globally distributed NewSQL database management system. It follows multi-versioning concurrency control, along with synchronous data replication. Traditional database systems lack horizontal scalability, and NoSQL systems don't provide strong consistency, and cannot be used where high level of data consistency is required. NewSQL provide the best of both worlds - excellect scalability, and ACID guarantees like a RDBMS which is performed on a single node. Spanner is the first system to distribute data at global scale and support externally-consistent distributed transactions. This combination of availability and consistency over the wide area is generally considered impossible due to the CAP Theorem. Google's globally synchronized clock - TrueTime, is essential for consistency in reads and especially for snapshots that enable consistent and repeatable analytics. Spanner connects you to the data center that is geographically closest to you for reads, and when you write data, it distributes and stores it to multiple data centers. If there is a failure at the data center you try to read from, read of the data is completed from another data center that has a replica of the data. Spanner assigns globally consistent real-time timestamps to every datum written to it, and clients can do globally consistent reads across the entire database without locking[03][04]
History[05]
Spanner originated from the numerous concerns users of Bigtable as it can be difficult to use for some kinds of applications: those that have complex, evolving schemas, or those that want strong consistency in the presence of wide-area replication. Many applications at Google had chosen to use Megastore because of its semi-relational data model and support for synchronous replication, despite its relatively poor write throughput. As a consequence, Spanner started as a fork of Bigtable and then became a versioned key-value store that then expanded into a temporal multi-versioned database
Concurrency Control[06][07]
Spanner uses multi-versioned concurrency control with 2-phase locking. Spanner's MVCC and 2PL implementation is unique in that it uses hardware devices (e.g., GPS, atomic clocks) for high-precision clock synchronization. The DBMS uses these clocks to assign timestamps to transactions to enforce consistent views of its multi-version database over wide-area networks. The Paxos state machines are used to implement a consistently replicated bag of mappings. The key-value mapping state of each replica is stored in its corresponding tablet. Writes must initiate the Paxos protocol at the leader; reads access state directly from the underlying tablet at any replica that is sufficiently up-to-date. At every replica that is a leader, each spanserver implements a lock table to implement concurrency control. The lock table contains the state for two-phase locking: it maps ranges of keys to lock states. Spanner provides three types of operations: read/write transaction, read transaction and snapshot read operation. A single write operation is performed through a read/write transaction, while a single read operation, not a snapshot read, is performed through a read transaction. Operations that require synchronization, such as transactional reads, acquire locks in the lock table; other operations bypass the lock table. In summary, lock-free read-only transactions, and non-blocking reads in the past are supported
Data Model[06]
Spanner has a semi-relational data model. It is called semi-relational because it has some features that differentiate it from normal relational databases. One major difference is that in Spanner, all tables should have at least one primary key. Data in Spanner is stored in a semi-relational table that has a schema. Each row is required to have a name, and a row has existence only if some value (even NULL) is defined for the row’s keys. Unlike an RDBMS table, each data also has version information.
Isolation Levels[06]
Spanner provides globally-consistent reads at a time-stamp due to TrueTime and externally-consistent distributed distributed transactions
Logging[09]
Every write to a log involves a cross-region Paxos agreement, so the latency of two-phase commit in Spanner is at least equal to three times the latency of cross-region Paxos. This is similar to a write ahead log in a non distributed setting.
Storage Architecture[07]
The universal set of Spanner is called universe. A universe consists of multiple zones. A zone means a unit that can be run with physical independence. A data center may have one or more zones. Data can be stored separately in different server groups by making two or more zones in a single data center
Storage Model[06]
Spanner stores rows in sorted order by primary key values, with child rows inserted between parent rows that share the same primary key prefix. This insertion of child rows between parent rows along the primary key dimension is called interleaving, and child tables are also called interleaved tables. Client applications declare the hierarchies in database schemas via the INTERLEAVE IN declarations. The table at the top of a hierarchy is a directory table. Each row in a directory table with key "test", together with all of the rows in descendant tables that start with "test" in lexicographic order, forms a directory. Primary keys in Spanner allow you to physically co-locate rows of related tables
System Architecture[13][14]
Spanner has a shared-nothing architecture that provides high scalability. Data is automatically sharded replicated across multiple data centers. The application developer can choose the number of replicas and their placement. Spanner relies on GPS and atomic clocks to represent the uncertainity in time.
Citations
16 sources- Spanner: Always-on, virtually unlimited scale database | Google Cloud google.com
- Spanner (database) - Wikipedia wikipedia.org
- Spanner, TrueTime and the CAP Theorem research.google
- Spanner: Google’s Globally-Distributed Database | USENIX usenix.org
- https://www.cubrid.org/blog/dev-platform/spanner-globally-distributed-database-by-google cubrid.org
- https://www.usenix.org/system/files/conference/osdi12/osdi12-final-16.pdf usenix.org
- What's Really New with NewSQL? cmu.edu
- https://opencredo.com/google-spanner-first-look opencredo.com
- https://fauna.com/blog/distributed-consistency-at-scale-spanner-vs-calvin fauna.com
- Life of a Spanner Query | Google Cloud Documentation google.com
- Query syntax in GoogleSQL | Spanner | Google Cloud Documentation google.com
- https://db-engines.com/en/system/Google+Cloud+Bigtable;Google+Cloud+Spanner;VoltDB db-engines.com
- Life of Spanner Reads & Writes | Google Cloud Documentation google.com
- Shared Nothing v.s. Shared Disk Architectures: An Independent View | Ben Stopford benstopford.com
- Optimizing Schema Design for Spanner | Google Cloud Documentation google.com
- https://static.googleusercontent.com/media/research.google.com/en/archive/spanner-osdi2012.pdf googleusercontent.com