Distributed database data table is divided into multiple parition, distributed on different servers, topology is each server maintenance different version timestamp, compared to the single-machine database, provide MVCC is much more complex, of course, if you have spanner atomic clock, it would be much simpler.
A feasible implementation scheme is described.
This scenario can be guaranteed as follows:
1. Single partition read (Distributed transaction reading) take care of repeated read.
2 distributed transactions on the same server can guarantee repeated read, and the external guarantee causality sequence;
3. Cross-partition cannot guarantee causality, but can guarantee repeated read
Data model
- 1. Multiple parition on each server, each of which is a MVCC data structure that has multiple versions
- 2 Each server has a publish_id that indicates that data with a version less than this value can be read to
- 3 the previous trans_id for each server, indicating that the version of the subsequent write data is greater than this value
Distributed Transaction Write
- 1 in the prepare phase of the two-phase commit, the participant sends a prepare ACK with a version number, which is the value of the local ++trans_id
- 2 The coordinator selects the Max_ver with the largest timestamp in the collected Parepare ack, sending a commit
- 3 When the participant receives this request, writes the commit log and uses Max_ver to update the TRANS_ID
- 4 commit log after landing, modify publish_id, and release row lock
Distributed transaction Read
Because the distributed read of the cross-server does not make any guarantees, only the distributed read of the same server is discussed, in short, to publish_id as the version number, if there is a row lock on the target line, and the row lock related version number is less than publish_id, then block waits for row lock release, Otherwise, it can be read directly. Nweet/stackedit
Distributed database MVCC reading and writing design