Google Distributed System

Source: Internet
Author: User

Google's search service needs to process and store massive amounts of data, and needs millions of search requests every day. It is a powerful distributed system. Let's take a look at Google's distributed system.

1. Distributed facilities

Three essential features for Distributed facilities: Distributed File System, distributed lock mechanism, and distributed communication mechanism. The distributed environment of Google is GFS, chubby, and Protocol buffer.

(1) GFS

GFS is mainly divided into two types of nodes. One is the master node, which stores non-data related to data files, rather than chunk (data block ). No data includes the location where a 64-bit tag can be mapped to the data block, the table that consists of the file, the location of the data block copy, and the process that is reading and writing specific data blocks.

In addition, the master node periodically receives updates (heart-beat) from each chunk node, keeping metadata up-to-date.

The second is the chunk node, which is mainly used to store data. On each chunk node, data files are stored in 64 MB mode by default for each chunk, and each chunk has a unique 64-bit tag, will be replicated multiple times in the distributed system. The default number is 3. GFS Architecture

 

(2) Chubby

In short, Chubby is a Distributed Lock service. With chubby, thousands of clients in a distributed system can "Lock" or "unlock" a resource ". It is often used for collaboration within systems such as bigtable and mapreduce. In terms of implementation, it implements "locking" by creating files, and uses the famous paxosAlgorithm.

As for the implementation mechanism, Chubby is a distributed file system that provides some mechanisms for the client to create files and perform some basic operations on the chubby service.

So how does chubby implement the "Lock" function? The chubby lock is a file. Creating a file is actually "locking" the operation. The server that successfully creates the file is actually grabbing the "lock ". You can open, close, and read files to obtain shared or exclusive locks and send updates to users through communication mechanisms.

As shown in, a chubby cluster consists of five machines, each of which has a copy, one of which will be selected as the master node. Replicas are equivalent to each other in terms of structure and capabilities. They use the paxos Protocol to maintain log consistency. They may be offline and then relaunched. After going online again, you need to maintain data consistency with other nodes. The client uses the chubby client library for access.

 

 

Why is a lock service used to solve the consistency problem instead of implementing a paxos-like algorithm protocol? This solution has the following five benefits.

A. Most developers do not consider this consistency issue when developing services at the beginning, so consistency protocols are not used at the beginning. Only when the service is gradually mature can we take this issue seriously and adopt the lock service to keep the originalProgramIn the case of architecture and communication mechanism, a simple statement is added to solve the consistency problem.

B. In many cases, it is not only as simple as selecting a master node, but also the address of the master node to others or to save a certain information. In this case, the chubby file not only provides the lock function, but also records useful information (such as the Master Address) in the file ). Therefore, many developers use Chubby to save metadata and configuration.

C. A lock-based development interface is more familiar to developers. Not all developers understand consistency protocols, but most of them should be locked.

D. Generally, common consistency protocols require several pairs to ensure high availability. In this regard, the paxos algorithm is the most obvious example. With chubby, only one client can be used.

E. Use the lock service because chubby not only solves the consistency problem, but also wants to provide more and more useful functions. In fact, many Google developers use chubby as a naming service, which is very effective.

 

(3) potolcol Buffer

Potolcol buffer is a language-neutral, platform-neutral, and scalable method used internally by Google to serialize structured data. It provides Java-based, the implementation of C ++ and Python (each implementation includes the compiler and library files of the corresponding language), and it is a binary format, therefore, the speed is about 10 times faster than that of data exchange using XML. It is mainly used for two aspects: RPC (Remote Procedure Call) Communication, which can be used for communication between distributed applications or heterogeneous environments.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.