TiKV is an open source distributed Key-Value database which is based on the design of Google Spanner and HBase, but it is much simpler without dependency on any distributed file system. It's has primary features including Geo-Replication, Horizontal scalability, Consistent distributed transactions, Coprocessor support.
TiKV is built on top of RocksDB, where all data in a TiKV node shares two RocksDB instances. One is for data, and the other is for Raft log. There are some major components in TiKV:
Read Committed Repeatable Read
TiDB/TiKV uses the Percolator transaction model. The default isolation level in TiKV is Repeatable Read
. When a transaction starts, there will be a global read timestamp; when a transaction commits, there will be a global commit timestamp. The execution order of transactions is confirmed based on the timestamps. The underlying details can be found in the Concurrency Control section.
Multi-version Concurrency Control (MVCC)
TiKV has a Timestamp Oracle(TSO) to provide globally unique timestamp. The core transaction model of TiKV is called 2-Phase Commit powered by MVCC. There are two stages within each transaction:
- PreWrite:
- Create a startTS
timestamp. Select one row as the primary row and others as secondary rows.
- Check whether there are locks on this row or whether there are commits after the startTS
. If conflicts exists, the transaction will be rollback. If not, lock the row.
- Repeat the second step on other rows.
- Commit:
- Write to the CF_WRITE
with current timestamp commitTS
.
- Release all the locks.
https://github.com/pingcap/tikv
https://github.com/pingcap/tikv
PingCAP
2016