interpreting the Google Distributed lock serviceBackground introduction
In April 2010, Google's Web Index update was updated in real-time, and at this year's OSDI conference, Google unveiled its first paper on the technology.
Prior to this, Google's index update, the use of the batch method (Map/reduce), that is, when the incremental data reached a certain size, the incremental data and the full index library join, get the latest index data. After updating the system with the new index, the data life cycle is shortened by 50%, the so-called data lifecycle refers to the data being crawled down from the Web page to the time interval shown in the search results, but as Google emphasizes, the system is only built for incremental updates and does not replace map/ batch job processing mode for reduce.
Schema overview
Google's next-generation incremental index update, –percolator, is built on BigTable, providing APIs that are as close to bigtable as possible, so the overall architecture looks like this:
Is there a difference between transactions (Transaction) and locks (lock)?
In the relational database domain, there is still a very big difference, but for percolator, Transaction = lock, so we discuss the distributed lock, also can be said to be distributed transactions, so the following mentioned lock or transaction, refers to the same thing.
Percolator uses BigTable's original row lock, coupled with some of its own ingenious practices, to implement the distributed lock service, which means that Google can update the PB-level index library in real-time. Recently we found that Google's search results are very time-sensitive, just write a good article, a few minutes later, Google can be retrieved, the reason is that Google's crawler in the capture of a new page, no longer wait for a certain number of times batch update index, but real-time updates, The data lifecycle is greatly shortened.
Percolator supports cross-row, cross-table transactions, taking full advantage of the existing row transactions and backup mechanisms of the bigtable itself.
A simple example
Before analyzing the details of percolator, let's look at a simple example and have a general understanding of percolator, which is beneficial to the later comprehension.
The following example is to reduce the popularity of UserA 10, add to UserB's popularity points, key means that each row of Key,data,lock,write is the column name, data storage, lock storage lock state, write indicates the transaction committed data location reference.
Initial state: UserA has 100 popular points, UserB has 50 popular points
Final status: UserA has 90 popular points, UserB has 60 popular points
STEP0 (initial state)
Key |
Data |
Lock |
Write |
UserA |
100:t1 |
|
|
UserB |
50:t2 |
|
|
STEP1 (take out 10 popular points from UserA)
Key |
Data |
Lock |
Write |
UserA |
90:t2100:t1 |
Primary Lock:t2 |
T2 |
UserB |
50:t2 |
|
|
STEP2 (Add the popularity of UserB to 10)
Key |
Data |
Lock |
Write |
UserA |
90:t2100:t1 |
Primary_lock:t2 |
T2 |
UserB |
60:t350:t2 |
Primary_lock:usera@data |
T3 |
STEP3 (transaction submission)
A: Commit primary First (remove lock, write new timestamp, reference where write column writes new data)
Key |
Data |
Lock |
Write |
UserA |
T390:t2 100:t1 |
|
T3:data:t2t2 |
UserB |
60:t350:t2 |
[Email protected] |
T3 |
B: Re-submit non-primary (step above)
Key |
Data |
Lock |
Write |
UserA |
T390:t2 100:t1 |
|
T3:data:t2t2 |
UserB |
T460:t3 50:t2 |
|
T4:data:t3t3 |
The end of the transaction, UserA has 90 popular points, timestamp is T3,userb has 60 popular points, timestamp is T4. (As for the way the lock is written and what the write is, explain it in detail later)
The execution process of a transaction
Percolator locks are divided into two types, primary and non-primary, and in the process of committing a transaction, the primary lock is submitted first, whether it is a cross-line or a cross-table, and the primary lock is no different.
Commit a transaction
The process of committing a transaction is divided into two steps, taking UserA as an example:
First, the Write column where the new data is written to the reference, note that it is not the data, is a reference (understood as a pointer will be more image), above Step3a T3:DATA:T2 represents the data submitted at T3 time, the latest data in the column T2 timestamp
Then, remove the contents of the lock column.
Because BigTable supports row locking, both of these steps are done within a single bigtable transaction.
Read operation
When a client initiates a read operation, it first requests the time stamp from the Oracle server, and then percolator checks the lock column, and if the lock column is not empty, then the read operation attempts to remove (fix) The lock or wait, The subsequent lock conflict handling details how to fix it.
Add: Oracle release time stamp is strictly incremental, and is not issued one at a time, but in a batch manner.
Write operations
When a client initiates a write operation, it first requests the Oracle server for time stamp,percolator to check the Write column if the timestamp of the write column is greater than the timestamp of the current client. The write fails (cannot overwrite the new data write-write conflict), if the lock column is locked, the current row is locked by another client, the client either writes unsuccessfully or tries to repair (lock conflict)!
Notify mechanism
Percolator defines a series of observer (similar to a database trigger) on a tablet server in BigTable, observer monitors a column or columns, and when data changes, it triggers observer. Once the observer is executed, subsequent observer are created or notified, resulting in a notification delivery.
Handling of lock conflicts
When a client is dropped in the commit phase of the transaction, the lock remains so that subsequent client accesses are blocked, which is called a lock conflict, and Percolator provides a simple mechanism to solve the problem.
Each client periodically writes tokens to chubby server, indicating that he is still alive, and when a client discovers a lock conflict, it checks to see if the client holding the lock is still alive, and if the client is working, then the client waits for the lock to be released. Otherwise, the client will erase the current lock.
Roll forward & Roll Back:
The client first checks to see if primary lock exists, because the transaction submission starts with primary, and if primary does not exist, then the previous client has submitted the data, so the client executes roll Forward operation: The non-primary corresponding data submitted, and clear non-primary lock, if the primary exists, the previous client has not submitted data crash, when the client to perform roll Back: Remove the primary and non-primary data and remove lock.
Summary
Google's distributed lock service is well supported by the real-time updating of incremental indexes, shortening the life cycle of the data. In this paper, the introduction of notify mechanism is relatively simple, interested please refer to the original paper.
Excerpt from: http://my.oschina.net/u/593721/blog/100389
-
Top
-
0
Interpreting the Google Distributed lock service