Google Cloud Platform Technology architecture

Last Update:2016-05-17 Source: Internet

Author: User

Tags new set

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Google CloudDesign principle: 1. Distributed File systems: Google Distributed File System (GSF) In order to meet Google's rapidly growing data processing needs, we designed and implemented Google File system (SYSTEM–GFS). GFS has many of the same design goals as traditional distributed file systems, such as performance, scalability, reliability, and availability. However, our design is also based on our observation of the load and the technical environment of our own applications, both now and in the future, and the assumptions of GFS and earlier file systems are significantly different. So we re-examined the traditional file system in the design of the compromise choice, derived from a completely different design ideas. First, component invalidation is considered a normal event, not an incident. GFS includes hundreds of or even thousands of of ordinary inexpensive equipment assembled by the storage machine, while being accessed by a considerable number of clients. The number and quality of GFS components results in the fact that some components may not work at any given time, and some components cannot recover from their current failure state. We have encountered a variety of problems, such as application bugs, operating system bugs, human errors, and even hard drives, memory, connectors, networks, and power failures. Therefore, the mechanisms for continuous monitoring, error detection, disaster redundancy, and automatic recovery must be integrated in GFS. Secondly, our files are very large, measured by the usual standards. A few gigabytes of files are very common. Each file typically contains many application objects, such as Web documents. When we often need to deal with fast-growing, terabytes of data sets made up of hundreds of millions of of objects, it is very unwise to adopt a small file that manages hundreds of millions of KB sizes, although some file systems support this way of managing. Therefore, the assumptions and parameters of the design, such as I/O operations and the size of the block, need to be reconsidered. Thirdly, most of the files are modified by appending data at the end of the file, rather than overwriting the original data. Random writes to a file are virtually nonexistent in practice. Once written, the file is read-only and is usually read sequentially. Large amounts of data meet these characteristics, such as: Data analysis program scan of the very large data sets, running applications generated by the continuous flow of data, archived data, one machine generated, another machine processing intermediate data, the processing of these intermediate data may be at the same time or may be a follow-up process. For this access pattern for massive files, the client is meaningless to the block cache, and the data append operation is the main consideration for performance optimization and atomicity assurance. IV, the collaborative design of application and file system APIs improves the flexibility of the entire system. For example, we relaxed the requirements for the GFS conformance model, which reduced the critical requirements of the file system to the application, greatly simplifying the GFSThe design. We have introduced atomic record append operations to ensure that multiple clients can concurrently perform append operations without the need for additional synchronization operations to ensure data consistency. There is also a detailed discussion of the details of these issues later in this article. google has deployed multiple GFS clusters for different applications. The largest cluster has more than 1000 storage nodes, more than 300TB of hard disk space, and is continuously accessed by hundreds of clients on different machines. 2. Parallel Data processing MapReduceMapReduce explains: MapReduce is a programming model for parallel operations with large datasets (larger than 1TB). The concept of "map" and "Reduce" are their main ideas, borrowed from functional programming languages, and borrowed from vector programming language features. "." It is greatly convenient for programmers to run their own programs on distributed systems without distributed parallel programming. The current software implementation is to specify a map function that maps a set of key-value pairs into a new set of key-value pairs, specifying the concurrency reduction function, which is used to guarantee that each of the mapped key-value pairs share the same set of keys MapReduce provides the following key features:1) Data Division and Calculation task scheduling: The system automatically divides a job backlog of big data into chunks, each data block corresponds to a compute task (Task), and the compute nodes are automatically dispatched to process the corresponding block of data. Job and task scheduling functions are primarily responsible for allocating and dispatching compute nodes (map nodes or reduce nodes), while monitoring the execution status of these nodes, and responsible for the synchronization control of the map node execution. 2) Data/code interoperability: In order to reduce communication, one of the basic principles is localization data processing, that is, a compute node as far as possible to deal with the data distributed on its local disk, which enables the migration of code to the data, when this localized data processing is not possible, Look for other available nodes and transfer data from the network to that node (data to Code migration), but try to reduce communication latency by looking for available nodes from the local rack where the data resides. 3) system optimization: In order to reduce data communication overhead, the intermediate result data will be merged before the reduction node. The data processed by a reduce node may come from more than one map node, in order to avoid the decrease in the reduce calculation phase. The intermediate results of the map node output need to be appropriately partitioned using a certain strategy to ensure that the correlation data is sent to the same reduce node, and the system performs some computational performance optimizations, such as performing multiple backups on the slowest computational tasks and selecting the fastest performers as the result. 4) Error detection and recovery: in a large-scale mapreduce compute cluster with low-end commercial servers, node hardware (host, disk, memory, etc.) error and software error is the norm, so mapreduce needs to be able to detect and isolate the faulty node and dispatch a new node to take over the compute task of the faulty node. At the same time, the system will maintain the reliability of data storage, improve the reliability of data storage with multi-backup redundancy storage mechanism, and can detect and recover the error data in time. The MapReduce design has the following main technical features: 1) scale out to "out", rather than "up", the MapReduce cluster is built with inexpensive, easy-to-scale, low-end commercial servers, rather than expensive, hard-to-scale high-side servers. For large-scale data processing, due to the large number of storage needs, it is obvious that low-end server-based clusters are far superior to clusters based on high-end servers, which is why mapreduce parallel computing clusters are based on low-end server implementations. 2) failure is considered to be a normal mapreduce cluster using a large number of low-end servers, therefore, node hardware failure and software error is the norm, so a well-designed, high-fault-tolerant parallel computing system can not be due to node failure and affect the quality of computing services, Failure of any node should not lead to inconsistent or uncertain results; When any one node fails, the other nodes canEnough to seamlessly take over the computational tasks of the failed node; When the failed node resumes, it should be able to automatically join the cluster seamlessly, without requiring the administrator to manually configure the system. The MapReduce Parallel Computing software framework uses a variety of effective error detection and recovery mechanisms, such as node automatic restart technology, so that the cluster and computing framework can deal with the robustness of node failure, and effectively handle the detection and recovery of failed nodes. 3) processing to data migration traditional high-performance computing systems often have many processor nodes connected to some external memory nodes, such as disk arrays connected by storage area networks (Storage Area,san Network), so external memory file data during large-scale data processing i/ o Access becomes a bottleneck that restricts system performance. In order to reduce data traffic overhead in large-scale data parallel computing systems, and instead to transfer it to processing nodes (data to processor or code migration), consideration should be given to moving the processing toward the data and migrating it. MapReduce uses the technique of data/code interoperability, the compute node will first of all be responsible for computing its locally stored data, in order to play the data localization characteristics, only when the node cannot process the local data, then use the nearest principle to find other available compute nodes, and transfer data to the available compute node. 4) sequential processing of data and avoidance of random access data large-scale data processing features determine that a large number of records are difficult to store in memory, and usually can only be placed in external memory for processing. Since the sequential access to the disk is much faster than random access, MapReduce is primarily designed for disk access processing for sequential, large-scale data. In order to achieve high-throughput parallel processing for large data set batches, MapReduce can access data at the same time using a large number of data storage nodes in the cluster to provide high-bandwidth data access and transmission using a set of disks on a large number of nodes in a distributed cluster. 5) for application developers to hide the system-level details of the Software Engineering Practice Guide, professional programmers think that the reason for writing programs is difficult because programmers need to remember too much programming details (from variable names to the boundaries of complex algorithms processing), which is a huge cognitive burden on brain memory, requiring a high concentration of attention , while parallel programming is more difficult, such as the need to consider complex details such as synchronization in multiple threads. Because of the unpredictability in concurrent execution, the debugging of the program is very difficult, and in large-scale data processing, programmers need to consider such details as data distribution storage management, data distribution, datacom and synchronization, and computational results collection. MapReduce provides an abstraction that isolates the programmer from the system-level details, and the programmer simply describes what needs to be computed (what to compute), and how the calculation (how to compute) is handled by the system's execution framework, This frees the programmer from the details of the system layer and devotes itself to the algorithmic design of its own computational problems. 6) FlatSeamless scalability The scalability mentioned here mainly includes two levels of extensibility: data expansion and system scale extensibility. The ideal software algorithm should be able to show the continuous effectiveness as the scale of the data expands, and the decrease in performance should be equal to the scale of the data; On the cluster scale, the computational performance of the algorithm should be able to maintain a near-linear growth with the increase of the number of nodes. Most of the existing single-machine algorithms do not achieve the above ideal requirements; The single-machine algorithm that maintains the intermediate result data in memory quickly fails in large-scale data processing, and the parallel computing from single machine to large-scale cluster requires radically different algorithm design. Amazingly, MapReduce can achieve the above ideal extensibility features in many situations. Several studies have found that for many computational problems, MapReduce-based computational performance can be maintained approximately linearly with the increase in the number of nodes 3. Distributed lock : ChubbyFirst of all, what is chubby? Chubby is primarily used to solve distributed consistency issues. In a distributed system, there is a set of process, they need to determine a value. So each process presents a value, and consistency means that only one of the values can be selected as the last determined value, and when the value is selected, all the process needs to be notified. This is the consistency issue. Second, it is a coarse-grained distributed lock service. In essence, chubby is a file system designed by Google to provide coarse-grained locking services, storing a large number of small files. Each file represents a lock. In GFs, creating a file is a "lock-in" operation, and the server that created the file succeeds is the one that preempted the "lock." The user acquires a shared or exclusive lock by opening, closing, reading the file, and sends the update information to the user through a communication mechanism. When a group of machines needed to elect master, the machines applied for a lock file at the same time. The server that successfully acquires the lock is selected as the primary server and writes its own address in the file. The other server gets the address of master by reading the data from the file. Other distributed systems can use it to synchronize access to shared resources. At the same time, this locking service is recommended, not mandatory, which can provide greater flexibility. Chubby's design goals are based on the following points: High availability, high reliability, support for coarse-grained, recommended lock services, support for small-scale file direct storage, of course, with high performance and storage capabilities tradeoff. 4. Structural Data Sheet BigTableBigTable is a non-relational database, a sparse, distributed, persistent storage of multi-dimensional sorting map. BigTable is designed to handle petabytes of data quickly and reliably, and can be deployed on thousands of machines BigTable has achieved several goals: broad applicability, scalability, high performance, and high availability. In many ways, BigTable is similar to a database: It uses a number of database implementation strategies. Parallel and in-memory databases are already scalable and performant, but BigTable provides an entirely different interface from these systems. BigTable does not support the complete relational data model; In contrast, BigTable provides customers with a simple data model that allows them to dynamically control the distribution and format of the data (Alex Note: The data is not formatted for BigTable, In the terminology of the database domain, that is, the data does not have schema, the user to define the schema, users can also speculate (Alex Note: reasonabout) The underlying storage data location correlation (Alex Note: Location correlation can be understood, such as tree structure, Data with the same prefix is placed close to the location. This data can be read at a time when it is read. The index of the data is the name of the row and column, and the name can be any string. BigTable treats the stored data as strings, but the bigtable itself does not parse the strings, and the client typically serially serialize the various structured or semi-structured data into those strings. By carefully selecting the mode of the data, the customer can control the location dependency of the data. Finally, the bigtable mode parameter can be used to control whether the data is stored in memory or on the hard disk. Features: 1, suitable for large-scale mass data, petabytes of data, 2, distributed, concurrent data processing, high efficiency, 3, easy to expand, support dynamic scaling, 4, suitable for low-cost equipment, 5, suitable for reading operation, not suitable for writing operations. 6, not applicable to the traditional relational database;

Google Cloud Platform technology architecture

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More