Google F1

F1 is a relational distributed transactional database. And it's built on Google's Spanner so that it can reach strong consistency. It combines RDBMS features and NoSQL scalability. And it maintains ACID guarantees and provides a distributed scalable database system.

History

Google Research published at SIGMOD, 2012 and then published a paper about it at VLDB, 2013. Google used to use Sharded MySQL for their Ads system. But it had bad performance on availability, scalability and functionality. So Google decided to build a new database on Spanner to keep all RDBMS features but with scalability.

Concurrency Control

Timestamp Ordering

The **Spanner** provides a global timestamp order for the transactions to use. Thus **F1** can use timestamp to achieve concurrency control. **F1** actually gets three kinds of transaction control methods: 1. Snapshot transactions: which are read-only transactions and will read repeatable data using **Spanner** snapshot from some timestamp. 2. Pessimistic transactions: which directly use **Spanner**'s transaction and will require **F1** server to acquire and hold locks. 3. Optimistic transactions: which are like a **OCC** way that would do a confirmation in the final write phase and abort if found something updated. This kind of transactions would never acquire **Spanner** locks. And **F1** would use a column of lock to represent locks of different granularity.

Data Model

Relational

F1 supports traditional relational data model. The *Cluster Hierarchical Model* would use repeated data (repeated ancestor's primary key) to store the data in a clustered way for fast joining process. The *Cluster Hierarchical Model* stores data like a tree with a root row, and then followed by its child rows and grand child rows.

Foreign Keys

Supported

F1 supports foreign key as well as indexing.

Indexes

B+Tree

F1 directly uses a separated Spanner table to store index information. F1 has two kinds of indices which are *Local Index* and *Global Index*. *Local Index* requires to require root primary key as a prefix so that the index can be used to locate the row in the same **Spanner Directory** fast. *Global Index* is a costly index which can locate the data globally without prefix.

Isolation Levels

Snapshot Isolation

F1 provides strong consistency by using snapshot isolation.

Joins

Hash Join

F1 supports joining operation on not only the data source from Spanner but also from other remote data source such as CSV files and BigTable.

Logging

Physical Logging

F1 uses change history named *ChangeBatch* to maintain triggers and to log changes. Every transaction will have a *ChangeBatch* stored with root row that stores all changes of all children rows that belongs to it. The *ChangeBatch Table* can be queried to do trigger or update stuff by clients.

Query Compilation

Not Supported

Query Interface

SQL

F1 provides users with SQL interfaces and NoSQL Key-Value style interfaces. For SQL interfaces, besides traditional SQL queries, F1 also supports joining operations with outer data source like CVS files.

Storage Architecture

Disk-oriented

F1 is co-developed with Spanner and uses GFS as its lowest level storage system.

System Architecture

Shared-Nothing

The F1 architecture contains *F1 servers*, *F1 queriers pool*, *Spanner Storage servers* and *GFS*. **Spanner** relies on Paxos based quorum to gain failure tolerant replicas.

Website

https://ai.google/research/pubs/pub41344

Developer

Google Inc

Country of Origin

US

Start Year

2013

Project Type

Commercial

Written in

C++

Supported languages

C++