Author: Liu Xuhui Raymond reprinted. Please indicate the source
Email: colorant at 163.com
Blog: http://blog.csdn.net/colorant/
More paper Reading Note http://blog.csdn.net/colorant/article/details/8256145
Keywords
Spanner,
External consistency,
Cross-data center, true time
=
Target question=
Provides a high-performance, global distributed synchronous backup database
=
Core Ideology=
Spanner is designed to support high-performance databases with hundreds of data centers and millions of servers. It focuses on providing high reliability and data consistency across data centers with high performance.
The spanner implements global consistency of data reading and writing in a timestamp-based manner. The key to efficient implementation in global databases lies in the underlying truetime.
API implementation.
Truetime API
Based on GPS and atomic clock, to ensure that the absolute time difference obtained by each server is within 1-7 milliseconds
Based on the precise timestamp provided by the truetime API, the spanner coordinates and manages the absolute time and submission order of the two-phase commit through the leader elected by paxos, thus ensuring data read/write consistency.
=
Implementation=
The deployment of a cluster in the spanner is called a universe. Each universe is composed of multiple zones. Each zone can be roughly analogous to a bigtable cluster. The zone contains a zonemaster to manage data distribution. hundreds or thousands of spanservers are responsible for actual data storage and query, and several locations
Proxy is used to route the client to a specific spanserver. Universemaster only monitors performance data. Each zone is a physical isolation unit, placement
The driver is responsible for data backup and migration between zones.
The internal data organization of the spanserver is similar to the tablet of bigtable, but it seems that it has nothing to do with bigtable. Each spanserver manages hundreds to thousands of tablets (including key-> value ing data similar to multiple versions). Each tablet architecture has a paxos state machine for collaborative concurrent operations. The underlying file system is Colossus (known as the next generation of GFS, and no relevant literature is found ...)
All write operations must be initiated by the leader, and read operations can be completed directly by servers whose data timestamps meet the update status. In cross-tablet operations, the leader of each paxosgroup works collaboratively.
=
Related research and projects=
The spanner's design goal is very similar to that of external store. The problem with external store is that the throughput of concurrent write operations may be poor. There is no detailed comparison of test data for similar applications, so I can only believe what spanner says. According to the rough principle, the reason why the spanner can do better is:
- More detailed paxos state machine (Tablet
V. S. entity group) reduces the possibility of conflict
- The underlying architecture of External Store is hbase, with high communication overhead. The spanner directly manages the tablet, which simplifies the hierarchy.
- The support of true time API base for global consistency simplifies the implementation logic of concurrent read/write (this should be well understood)