Google Distributed relational database F1

Source: Internet
Author: User

F1 is a distributed relational database developed by Google that serves Google's advertising system. Google's ad system used to use MySQL, the ad system users often need to use complex query and join operations, which requires the design of shard rules to pay extra attention to, as far as possible to shard the relevant data to the same MySQL. The expansion of the data when the Reshard also need to ensure that this point, the expansion of the advertising system is more difficult. In terms of usability, the old advertising system does not do enough, especially if the entire data center hangs up, and some services will not be available or lose data. For the ad system, a short outage is a significant loss. To solve the problem of capacity expansion/high availability, Google developed F1, a distributed relational database based on spanner across the data center that supports acid and global indexing. It was launched in early 2012.

Several features of the F1 are highly available

It can be said that almost all spanner, spanner through the atomic clock and GPS receiver TrueTime API to solve the problem of clock error across the data center, and then solve the problem of distributed transaction timing, so that the external consistency. Multiple copies of the consistent through Paxos.

Global index

Based on the distributed read-write transaction provided by Spanner (strict two-phase lock + two-phase commit), F1 implements the global index. The index table and the data table are actually two tables, the two tables generally exist on different spanner machines, and the consistency of the two tables is resolved by spanner distributed read-write transactions. In this case, the global index involved in the same transaction should not be too much, because each more global index is equivalent to participant in a two-phase commit, for distributed transactions, the more participant, the worse the performance, and the smaller the probability of a transaction being successful.

Cascading schema

Ideas are similar to Megastore, and there are hierarchical relationships between tables and tables. Stores related data from related tables on a single machine. For example, for the advertising system, is to have an advertiser and his compaign, such as storage together, advertisers as a table, compaign as another table, the Advertiser table in each row represents an advertiser, The Advertiser table is called the root table, Compaign table is called the child table, Each line in the Advertiser table is called the root record, and the row in the Compaign table is called a child record, and all compaign and The Advertiser under the same advertiser are stored on the same spanner machine. The benefit of this is that an operation can fetch all the relevant data, join quickly, and not cross-machine.

Three types of transactions
    1. Snapshot read. Direct use of snapshot read transactions provided by spanner
    2. Pessimistic transactions. Direct access to read and write transactions provided by spanner, plus two-phase lock
    3. Optimistic business. Implementation of pessimistic transactions based on spanner. Such a transaction is divided into two stages, the first stage is the reading phase, the duration is unlimited, no lock is added, the second stage is the write phase, which is the commit transaction phase. The basic idea is to save the last modification time of all the rows accessed during the read phase to the F1 client, which sends all the time to f1,f1 to open a spanner read-write transaction, which reads the last modification time of those rows for check, if it has changed, The description detects a write conflict and the transaction abort.

F1 uses optimistic transactions by default, mainly considering the following aspects:

    1. Because the reading phase is not locked, can tolerate some errors caused by the misuse of the client
    2. Similarly, the read phase is not locked and is suitable for some scenarios where the F1 needs to interact with the terminal.
    3. For some error scenarios, you can retry directly on the F1 server without the need for F1 client involvement.
    4. Since all States are maintained on the F1 client side, this request can be sent to other F1 servers to continue processing after a F1 server is hung up.

Of course, this brings two questions:

    1. There is no last modification time for rows that do not exist, so the number of rows returned by the same statement may be different during the execution of other read transactions, which is not allowed under the isolation level of repeatable read, a typical solution is gap lock, or range lock, in F1 , this lock can be the root record in the root table column, which represents a gap lock, only to get the lock, to the child table in a range to insert rows.
    2. High concurrency modification performance is low for the same row. Obviously, the optimistic agreement is not suitable for this scenario.
Deployment

Google deployed the F1 and spanner clusters used by the advertising system to 5 data centers in the United States, two on the east Coast, two on the West Coast, and one in the middle. Equivalent to 5 copies of each data, with a data center on the east coast being used as a leader data center. In the Paxos implementation of spanner, there is a leader copy in the 5 replicas, and all the read and write transactions on the copy pass through it, and the leader copy is generally present in the leader data center. Since the Paxos protocol only requires a majority response, the delay of a Paxos operation depends largely on the delay between the Leader data center on the east coast and the other data center on the east coast, and the Middle data center. As can be seen from here, the F1 client and F1 server are best deployed in the leader data Center for more F1 client writes. In this configuration, the commit delay of the F1 user is approximately 50ms to 150ms. The read delay is about 5~10ms.

Google Distributed relational database F1

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.