Database newsql Google F1 system

Source: Internet
Author: User

Introduced

Google's data system performance is very high requirements, MySQL such a system is difficult to satisfy, so Google Design F1 database, its goal is to have a high degree of scalability and high stability, in addition to the necessary SQL language support, F1 also provide ad hoc type query.

Basic architecture

  
The user interacts with the client library and F1. The user's request is first sent to a F1 server, and the F1 server is responsible for the task assignment and data processing.
To reduce the latency (latency) caused by processing requests, the F1 client and the directly connected load balancer will first connect to the nearest F1 server, but if the nearby F1 server is busy or fails, it will find other servers that are farther away.
F1 servers are typically placed in a data center with spanner servers, as this can increase the speed of data read and write, but F1 servers can also connect to spanner servers in other datacenters. Spanner data from a file system called CFS (Colossus file systems), connected spanner and CFS are always in the same data center, that is, spanner in one data center will not be connected to the CFS in another datacenter.
Because the F1 server does not store data, the F1 server can increase or decrease depending on the size of the traffic being accessed.
The entire F1 system processing request can be done on a single node, or it can be distributed over multiple nodes, and the key is to see which delay is smaller. If the request is handled in a distributed manner, F1 specifies multiple processes (process) to complete, and a secondary server (slave server) takes over a process. Multiple subordinate servers are managed by one primary server (master server). F1 also supports MapReduce.

Spanner

F1 and spanner are developed at the same time, but spanner more inclined to the underlying processing, such as caching, data sharing and handling, data location calculations, and so on.
The F1 itself is a relational database, so the data is placed in a row and a list (table). Spanner all the data into a cluster of rows of data (cluster), or directory. Each directory has at least one segment (fragment), and a large directory usually has several segments. Multiple segments of a directory make up a group, and each data center holds a copy of the group.
Spanner uses a two-phase lock (two-phase locking) and two-phase commit (two-phase commit) When processing query statements, so that the information interaction increases by one time in the network, increasing the latency. To ensure data consistency (consistency), Spanner uses timestamps (timestamp) to sort each transaction to ensure that the global transaction is executed in an orderly manner.

Data model

F1 's data model is similar to a relational database, and there is a data outline (schema), but there is also an extension on that basis.
The Tables (table) under the F1 data outline are layered. All keys inside the subform (child table) must contain the key inside the parent table. Each row in the root table (root table) is called Gank (Root row), and all sub-table rows derived from a root row form a spanner directory

Protocol buffers

A disadvantage of a common database is that data in a structure can be converted to database data through cumbersome code, and Google uses the Protocol buffer Corpus (library) to support structured data types for each column of the table.

Data index

F1 has two indexes, one is the local index, and the local index key must contain the key of the root row as the prefix. The local index's key and indexed results are placed in the same spanner directory as the actual root row, so changes to the local index take up a small portion of the transaction.
The other index is the global index. In contrast, the global index key does not contain the key of the root row and is stored separately from the indexed data. Many directories have access to global indexes, and global indexes reside on multiple spanner servers. If you modify a row that is indexed, you need to update the index with 2PC (2 phase commit). Global indexes are not doing very well in terms of extensibility. For example, a transaction that adds 1000 rows of data adds hundreds of records to the index, and 2PC handles these hundreds of new records to become slow. Google is currently looking at ways to make the global index more scalable.

Modify Data Summary (schema)

F1 database is a global database, which means that everyone in the world can modify the same data in the summary, Google requires in this process will not allow any failure or table lock (table locking).
So how does it work? Google's approach to revising the profile is to use the non-F1, which means that different servers at the same time may store two of the same essentially different databases. In this respect, Google designed its own algorithm.

Transaction execution

F1 has three types of transactions:
1. Read-only (read-only) transactions
2. Transactions directly mapped to spanner: This type of transaction is handled directly by spanner.
3. Read and Write transactions: In order to prevent the data from being read by other transaction modification, F1 will also give the last modified time, once the data is updated, this time will be updated.
By default, the F1 client uses read-write transaction mode. The advantages of doing so are
-read data does not require a data lock and does not conflict with overwriting data, and the transaction is not affected.
-Some transactions would have taken a long time and would not have been banned
-Transactions can be re-executed after an error
-Clients can connect to another server after discovering that a remote server has failed
-Allow data outside of processing transactions to be read during transaction execution
But the model also has shortcomings, that is, if a data is frequently transferred, then the system on this data efficiency (throughput) is very low.

Data lock

Each row of data in the F1 database has a lock column (lock). Lock columns can be freely customized by the user and are responsible for locking the data in each column of the row. This allows the different columns of each row to be read and modified by different transactions.

Record modification

F1 needs to record all past changes to the database. F1 records the modification records for each transaction, including the pre-and post-modification values for each row, and primary key. The primary key here contains the key of the root table and the time when the transaction was committed, all of which were placed in a different table. These records play an important role in F1 related applications.

Client

Before F1, many Google database applications are using MySQL Orm,orm is not suitable for use in F1, because the scalability is not high. So Google designed its own API. F1 also supports NoSQL and SQL.

Transaction query Processing

F1 supports single-point transactions and distributed transactions, and single-point OLTP is handled by a F1 server, and distributed OLAP is handled jointly by a subordinate server (slaver) under F1.

Working with Remote Data

In F1, the join command in SQL requires reading data from multiple datacenters, which leads to network latency, with data bulk transfer (batching) and pipelined operations (pipelining).
Data calculation is usually calculated as part of the calculation is output, reducing the waiting time. The advantage of this is that parallelism is efficient and reduces storage buffering, but the disadvantage is that data cannot be sorted.

Distributed computing

Each transaction execution plan contains dozens of sub-plans, each of which is performed by several subordinate servers, and data partitioning (partitioning) technology is also used here.

More Resources (US)

F1 paper
Related handouts

Database newsql Google F1 system

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.