GFS Architecture in Google cloud computing (collation analysis)

Source: Internet
Author: User
Tags server memory

Google has developed cloud computing technologies for many of its businesses, which have a huge amount of data. Google's file system in Google's cloud computing architecture is one of three key technologies in cloud computing. This article mainly introduces the GFS architecture of Google based on its application to the file system, first introduces the Google Cloud computing platform, then introduces the GFS framework designed by Google, and the features of the three types of components, the interaction between components and the framework And then, by introducing the Google File system based on the GFS framework, explains how the components in the GFS framework work together to complete some of the GFS operations, and finally analyze the GFS's quality attributes, from which the GFS framework can be seen to meet the Google application's file system The requirements.

1 Google Cloud computing platform
1.1 Cloud computing
Cloud computing is a computing model that uses the Internet to access shared pools of resources, on demand and on the go. "Cloud" refers to some self-maintained and managed virtual computer resources, computer groups, usually some large server clusters, including computing servers, storage servers and broadband resources. In a cloud computing environment, users do not need to understand the details of the infrastructure in the cloud, nor do they need to know the corresponding expertise of the cloud, just focus on what resources they really need and how to get services over the network. From the user's point of view, the "cloud" in the resources can be infinitely extensible, as long as the network access to the data center, can be readily available, real-time use, on-demand expansion of computing and storage resources. It has the characteristics of resource sharing, non-redundant function development, resource sharing, system continuity and so on.

1.2 Google cloud computing
Google is the biggest consumer of cloud computing technology, with the world's largest search engine. To solve the massive data storage and rapid processing problems of search engines and other large-scale businesses, it has developed technologies that allow millions of computers to work together. The Google Cloud Computing platform architecture includes 4 systems that are independent and tightly integrated: Google's File system distributed operating system built on top of the cluster, map/reduce programming patterns for Google Apps, distributed Locking mechanism Chubby and a simplified large-scale distributed database BigTable developed by Google. Google has published papers that say the three key gems of cloud computing are: the Google File system (GFS), MapReduce, and Bigtable in the architecture. Among them, the Google file system, the Google filesystem, is a distributed file system.

2 GFS Architecture
There are three types of roles in the GFS Schema: Customer (client), primary server (master), and Data Block server (chunk server). The nodes of these three roles form a GFS group that contains a single Master node, multiple Chunk servers, and multiple clients, which are described in the following three categories of roles.

Picture to be sorted

2.1 Introduction of each component
2.1.1 Client

The so-called client, a set of API interface functions similar to the traditional file system, is a set of dedicated interfaces, non-compliance with the POSIX specification, the application accesses these interfaces to implement operations, support common operations, such as creating new files, deleting files, opening files, closing files, reading and writing files, in addition, Snapshots and record append operations are also provided. According to the specific situation of Google application, provide record append operation is because the random write operation of the file is almost nonexistent, the read operation is usually in order, most of the files are modified by appending data at the end of the file, so that the record append operation allows multiple clients to append data to a file at the same time. It is useful for implementing a multi-path result merge, as well as a "producer-consumer" queue. and record append operations to ensure that each client's append operations are atomic, and multiple clients can append data to a file without requiring additional synchronization locks. Snapshots Create a copy of a file or directory tree almost instantaneously, at a very low cost. Snapshot operations can quickly create a branch copy of a huge dataset, or back up the current operational state so that it can be easily committed later or rolled back to the backup state. The client code is encapsulated as a library function, which is provided to the client program in the form of a library function, which is linked to the client program, which implements the interface function of the filesystem, communicates with the master node and the chunk server, and reads and writes the data.

2.1.2 Chunk Server

Chunk server is responsible for the specific storage work, it is to do is to save the Chunk on the local hard disk, read and write block data. The stored files are divided into fixed-size chunk, each chunk will be created with a constant, globally unique 64-bit chunk identity, chunk server to read and write block data based on this identity and byte range, chunk is saved as a local file Saved. To ensure reliability, Chunk copies multiple copies in different machines, defaults to three copies, and has three uses for replicas: Chunk creation, re-copying, and re-load balancing. The copy of Chunk is owned by master. The 2.1.3 Master node Master node manages all of the file system metadata and manages system-wide activities. The master server stores metadata in three categories: the namespace of the file and chunk, the correspondence of the file and the chunk, and the place where each chunk copy is stored. This information is stored in the master server's memory, which guarantees the operation speed of the master server. The first two types of metadata are also recorded in the system log file of the operating system as a record of the change log, while the log files are stored on the local disk, and the logs are copied to the other remote master servers. The storage location of the third meta-data chunk will not be persisted, except when the master service is started or a new chunk server joins, the master polls each chunk server for the chunk information they store. Master manages system-wide activities, such as managing replicas of all chunk within the entire system, determining chunk storage location, creating new chunk and its replicas, coordinating a wide variety of system activities to ensure that chunk are fully replicated, and load between all chunk servers Balance, reclaim storage spaces that are no longer used, and more.

2.2 Interactions between components
2.2.1 Client and Master node

When accessing the file system, the

client accesses the master node first, obtains the chunk server information which interacts with it, then accesses these chunk servers directly, completes the data access work, the client does not read and write the file data through the master node, thus reducing the Master node read and write, can avoid a large number of read and write operations caused by the master node becomes the bottleneck of the system, and the client usually in a single request to query multiple chunk information, avoid the client and master node multiple communication. The application on the client needs to send the file name and chunk index to the primary server, querying the primary server for the specific chunk location of the file to be manipulated. The location of the primary server returned to the client chunk handle and chunk after receiving information from the client. 2.2.2 Client and Chunk server clients, after obtaining the chunk handle and chunk location of the primary server, can access the specific chunk server, the GFS block server, to obtain file data based on this information. The client can access the chunk server directly and get chunk data from the chunk server. 2.2.3 Master and Chunk server master nodes use heartbeat information to communicate with each chunk server periodically, send instructions to each chunk server, and accept status information from the chunk server, which also ensures that the information held by the master server is always up-to-date Of When the master service is started or a new chunk server is joined, master polls each chunk server for chunk information that they store. Between master and Chunk server, master wants to send instructions to chunk server, chunk server sends block server state to master.
features of the 2.3 GFs architecture
(1) Adopt a single master node policy. A single master node strategy simplifies the design of distributed file systems, which can pinpoint the location of chunk and replicate policies with global information, because there is only one master, all cloud data is only one copy, so there is no metadata consistency problem.

  (2) Adopt central server mode. The architecture of some distributed file systems, such as XFS, GPFS, removes the central server and relies only on distributed algorithms to ensure consistency and relevancy. The method of choosing the central server can simplify the design, increase the reliability, be able to expand flexibly, increase the chunk server conveniently, and because the central server can know the status of all the sub-servers, it is easy to learn the load condition of each sub-server. But the central server model also has a fatal disadvantage, that is a single point of failure, when the single point of failure occurs in the central server, will cause the entire system is not available. In addition, the performance of a distributed file system can be guaranteed through a number of specific designs:

(a) saving almost all of the Chunk-related information and controlling all change records of the Chunk through a centrally located Master server greatly simplifies the implementation of a very complex Chunk allocation and replication strategy and facilitates load balancing. The so-called load balancing is the load balance that master takes during all chunk services, master checks the distribution of replicas, and then removes the replicas on the chunk server where the remaining space is below the average, thus making better use of hard disk space and balancing the overall disk usage of the system.

(b) By reducing the number of state information saved by the master server, the state of the master server is replicated to other nodes, ensuring the system's disaster-redundancy capability. Changes to the Master server state can be persisted in the form of a pre-written log.

(c) The system's scalability and high availability are ensured through the shadow Master server mechanism. The shadow server provides read-only access to the file system when the primary master server is down. The shadow Master server can improve the efficiency of reads for files that do not change frequently, or for applications that have a small amount of outdated data to allow for access.

(3) There is only control flow between client and master, and there is no data flow, because the control flow only needs to transmit instruction and State, the data volume is small, this can reduce the load of master and improve the speed of master.

(4) The client and the chunk server can transfer the data stream directly, and because the file is divided into multiple chunk for distributed storage, the client can simultaneously access multiple chunk servers concurrently, thus improving the I/O parallelism of the system.

3 GFS Architecture app
The GFS framework was designed by Google for its own application, and it was built on the framework of GFS, the Google file system.
3.1 GFS
GFS is the cornerstone of Google Cloud storage, and other storage systems, such as Google Bigtable,google Megastore,google Percolator, are built directly or indirectly over GFS. In addition, the Google large batch system MapReduce also need to use GFS as a huge amount of data input and output. In addition to having the scalability, reliability, and usability of the Distributed file system in the past, GFS's design is also influenced by Google's application load and technical environment, and GFS has the following features in its design:

? The failure of individual servers in a server cluster is a normal phenomenon, not an accidental event. Because the number of nodes in the Google cluster is very large, using thousands of servers for common computing, there will always be servers in a state of failure, it through the software program module, monitor the dynamic state of the system, detect errors, fault tolerance and automatic recovery system integration in the system.

? The file size in Google's system is usually measured in G-bytes, which is not the same as the file size concept in a typical file system, and a large file may contain a large number of small files in the usual sense. This can affect design expectations and parameters, such as block size and I/O operations.

? Most files are modified by appending data at the end of the file, rather than overwriting the original data.

? The application and file system APIs are designed together to improve the flexibility of the entire system.

3.2 GFS Implementation based on GFS framework
Google deploys multiple GFS clusters for different applications, one GFS cluster containing a single Master node, multiple Chunk servers, and access by multiple clients. All node machines as master, chunk, and client are usually common Linux machines.

The implementation of all Google file system operations is done through interaction between Master, client and chunk server, including creating new files, deleting files, opening files, closing files, reading and writing files, appending records, and snapshot functions. This article describes the three types of components through interactive implementation of the simple read operation, write operations.

3.2.1 Read operation

When the client requests the read operation, the main point is to ask the master node to contact the chunk server and obtain the relevant metadata information, and then directly and chunk the server for data read operations.

The specific interaction process is as follows: first, the client interacts with master. Based on the fixed chunk size, the client converts the file name and the program-made byte offset into the chunk index of the document, and then throws the file name and the converted chunk to the master node. The master node sends the corresponding chunk identity and the location information of the replica back to the client based on the metadata that is stored in memory. The client uses the file name and chunk index as the keyword cache for chunk identity and replica location information.

The client then interacts with the chunk server. The client selects a recent copy from the replica to send a read operation request. The request information includes the chunk identity and the byte range.

3.2.2 Write Operations

When the client requests the write operation, it mainly asks the master node to contact the chunk server and obtains the relevant metadata information, and writes the data directly to the chunk server and all copies.

The lease mechanism in GFS is introduced before the interaction of the write operation is introduced. The so-called lease mechanism is to set up a master chunk, which is the main chunk to serialize all the change operations, thus guaranteeing the consistency of the order of change between multiple replicas. The chunk copy holding the lease is the primary chunk, and the chunk lease is established by the Master node.

The specific interaction process is as follows: first, the client interacts with master. The client asks the master node which Chunk server holds the lease, and the location of the other replicas. If Master does not find a chunk to hold the lease, a lease is established for one of the replicas. Master returns the location of the identifier and other replicas to the client Chunk. The identifier and other replica locations of the client cache chunk.

Then there is the interaction between the client and the chunk server. The client pushes the data to all replicas in any order. The Chunk server stores the received data in its internal LRU cache. After all the replicas confirm that the data is received, the client primary Chunk server sends a write request. This request identifies the data that was previously pushed to all replicas.

Finally, the interaction between the master chunk and the other replicas. The primary Chunk assigns sequential serial numbers to all the operations received, and then applies the operation to its own local state in the sequence of sequence numbers. The master Chunk passes the write request to all copies. Each copy performs these operations in the order of the sequence numbers. After the operation is complete, all the replicas reply to the main Chunk they have completed the operation.

The primary Chunk server replies to the client. In this process, any errors produced by any replica are returned to the client.
4 GFS Mass Properties
4.1 Reliability
GFS is highly reliable, and although the nodes in the GFS cluster use a common PC with poor reliability, GFS provides master fault tolerance and master backup for node failure issues. In addition, data validation, load balancing and heartbeat messaging mechanisms are provided to ensure the high reliability of GFS.

The Master server memory holds metadata, where changes to the file and chunk namespaces and mapping tables are recorded in the operations log, which provides fault tolerance for both types of metadata, and chunk copy location information is stored on each chunk server, so If the disk data is kept intact, the above metadata can be recovered quickly in the event of master failure.

Master backs up the status of master, Operations records, and checkpoints so that if master or hard disk fails, System Monitor discovers and launches a backup machine by changing the domain name, and because the client is only using the name to ask, it will not notice changes to the master server.

When the chunk copy is read, the chunk server compares the data that is read to the test and then returns an error if it does not match, and then causes the client to select a copy on the other chunk server.

Master adjusts the replicas to better disk and load distribution by periodically rebalancing, without overloading the chunk server.

The Chunk server periodically proactively wants to report the running State and the state of the data, and if Master does not receive this information for a certain amount of time, it is assumed that the Chunk server is down and then replicated to guarantee the number of copies set.
4.2 Availability
The availability of the system defines the percentage of time that the system is operating normally, and the availability of the system focuses on the following aspects: How to detect a system failure, what happens when a failure occurs, when it can be safely faulted, how to prevent a failure, what notifications are required in the event of a failure, and so on.

GFS in the design of the failure of the individual server is a normal phenomenon of the strategy, in order to ensure the normal operation of the system, GFS through the software program module, monitor the dynamic operation of the system, detect errors, and the fault-tolerant and automatic recovery system integration in the system. This is achieved through two strategies: fast Recovery and replication. Fast recovery is when the master server or the chunk server fails, they can recover their status and restart in a few seconds. Replication includes chunk replication and master server replication.

When a chunk server is offline or a corrupted data is found, the master node clones the existing copy, ensuring that each chunk is fully replicated. Such replication makes it possible to fault-tolerant the failure of the chunk server. When the primary server goes down, the Shadow master server provides read-only access to the system.

In addition, in the detection of system failures, GFS detects failures in the system through continuous monitoring.
4.3 performance
Performance This quality attribute is often expressed by the amount of processing that can be done in a unit of time or by the time the system consumes to complete a process. In the performance of GFS, there are two main ways to ensure the performance of the system. On the one hand, to avoid master becomes the bottleneck of system performance, ensure the system can respond quickly and process requests. On the other hand, the application of caching mechanism is used to improve the speed of access.

Prevent Master from becoming a bottleneck for system performance. GFS is implemented by controlling the size of metadata, remote backup of Master, control information, and data diversion.

Reasonable application of caching mechanism. GFS does not provide a caching mechanism in the chunk server, and the cache policy is used for metadata stored in master. Although the caching mechanism is a means to improve the performance of the file system, but for the application scenario of GFS, the client is mostly stream sequential read-write, there is not a lot of repeated read and write, the cache data has little effect on performance improvement. On the other hand, master needs to perform frequent operations on its metadata, and the caching mechanism can improve the efficiency of the operation. Therefore, it is helpful to improve the performance of the system.
5 Conclusion
Based on the above research, the GFS framework designed for Google's specific situation divides a computer group into three categories: customer, primary server and block server, customer provides a series of operation interfaces, block server holds file data blocks, master server manages all file system metadata, Manage system-wide activities that are designed not only for scalability, reliability, usability, but also for Google's application load and technology environment. Some of these ideas can also be applied in the design of other distributed file systems.

Reference in this article:

Strong. A brief introduction to GFS cloud storage technology reliability [J]. Fujian computer, 2012, 28 (1): 76-77.
Liu Peng. Cloud computing [M]. Beijing: Electronic industry press. 2011.
Wanchunmei. Cloud computing application technology [M]. Chengdu: Southwest Jiaotong University Press. 2013.
Li Tianxi, Han Jin. Cloud computing technology architecture and practice [M]. Beijing: Tsinghua University Press. 2014.

Cai, Wang Shumei. Analysis of cloud computing instances based on Google [J]. Computer knowledge and Technology: Academic Exchange, 2009, 5 (9): 7093-7095.

GFS Architecture in Google cloud computing (collation analysis)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.