PolarDB is a commercial cloud based relational database product developed by the Alibaba. It is designed for clients with high read demand. PolarDB is compatible with two popular databases: MySQL, PostgreSQL. It has three layers. Users interact with database through computing layer. PolarFS is a distributed file system and polarStorage as storage level. PolarDB uses InnoDB as storage engine.
Multi-version Concurrency Control (MVCC)
Primary node (read-write) and replica nodes (read-only) communicate through message sender and ack receiver. Each of them have a buffer pool. Replica would update itself during runtime redo operation. It uses difference between written log sequence number and replica's applied log sequence number as replica lag to keep track of the version that replica is holding.
In PolarFS, it uses parallel raft protocol to coordinate multiple data chunk servers. ParallelRaft is a consensus protocol inherited from Raft but it allows out-of-order I/O completion tolerance capabilities of database.
Similar to MySQL, b+ tree is the default index data structure. B+Tree is ordered by primary key. One optimization related to B+Tree is that polarDB will record the location of last insertion to facilitate insertion next time. During parallel query execution which is a feature of polarDB, it will partition the B+tree to multiple workers. Each worker can only see its own partition. When one worker finished with one partition, it will automatically attach to a new partition.
PolarDB is disk-oriented as MySQL. Primary and replica nodes have buffer pool and they can access data and log in shared memory. Primary has the right to flush the page during normal operation. After redo in recovery, replicas will flush pages to disk. Primary node, after receiving the read view from new master, will also write pages to disk.
Each chunk server has a write ahead log (WAL). Any modification to a chunk server will be appended in log before updating the chunk. After primary node (read-write) modifies some pages, it will send logs to shared memory where replica could access. Replicas have log apply threads that modify their versions during redo.
https://www.alibabacloud.com/product/polardb
Alibaba
2017