Say the basic structure of Google cloud computing

Source: Internet
Author: User
Keywords cloud computing

The well-known Google, GFS is a google unique distributed file system designed by a large number of installed Linux operating system, through the PC form a cluster system. The entire cluster system consists of a Master (usually several backups) and several TrunkServer. The GFS files are backed up into fixed-size Trunks, which are stored on different Trunk Servers. Different Trunks have a lot of copy components and can also be stored on different Trunk Servers. Master is responsible for maintaining metadata in GFS, that is, the file name is called Trunk information. The client obtains Metadata from the Master, and according to the data to be read, the location in the file must correspond to the TrunkServer communication. This article focuses on the infrastructure comprised of Google Cloud Computing.

According to reports, google each data at least three different locations on the machine, so the reliability can be highly assured; and operating GFS and local disk as easy to operate.

Google currently has more than 200 GFS clusters, some of which have more than 5,000 computers. Google now has tens of thousands of connection pools to fetch data from GFS clusters, which can store up to 5 PBs of data and up to 40Gs of data per second in the cluster.

MapReduce is a programming model that is related to the implementation of processing / generating mass data sets. The user specifies a map function that handles key / value pairs via this map function and produces a series of intermediate key / value pairs and uses the reduce function to merge all intermediate key-value pairs with the same key value The value part. MapReduce's main contribution is to provide a simple and powerful interface that allows large-scale computation to be performed automatically concurrently and distributedly.

A common application example is the number of occurrences of each word in a large collection of documents. The map function examines each word and increments each word by 1 to its corresponding counter. The reduce function treats all occurrences of a particular word The number of mergers. Other examples:

URL Access Frequency Statistics: The map function handles the log of webpag requests and responses (URL, 1). The Reduce function merges the values ​​of all the same URLs and outputs a pair (URL, total number).

Inverse Web-Link, the map function outputs all the source web pages that contain the target URL, which is output as a structure pair (target, source). The Reduce function is associated with all the source lists associated with the same target URL and outputs a structure such as (target, list (source)).

Distributed sorting: The map function extracts keywords from each record and produces a pair of (key, record). The reduce function outputs all the key pairs as is.

BigTable: A distributed storage system for managing very large structured data that manages PB-based data distributed across thousands of servers. The Bigtable API will include functions for creating, editing tables and columns, changing cluster, table, and column metadata. Instead of supporting a complete relational data model, BT provides customers with a simple data model that allows customers to dynamically control the distribution and formatting of data. BT can only support most SQL.

Google App Engine

Google App Engine is a platform for developing, hosting web applications using Google-managed data centers. It released the first beta version in April 2008. Google App Engine uses cloud computing technology, which virtualizes applications across multiple servers and data centers. Currently, the programming languages ​​supported by Google App Engine are Python and Java. Http://appengine.google.com

Hadoop
Hadoop (http://hadoop.apache.org) is a distributed parallel computing framework under apache, a framework extracted from lunece. Hadoop is mainly composed of HDFS, MapReduce and Hbase.
HDFS is an open source implementation of the Google File System (GFS).
MapReduce is an open source implementation of Google MapReduce.
◆ HBase is an open source implementation of Google BigTable

Google's above distributed framework is very creative, and has great scalability, making Google in the system throughput is very competitive, but written in C + +. So the Apache Foundation has implemented an open source version of Hadoop in Java that supports Linux platforms such as Fedora and Ubuntu. The above introduces Google cloud computing infrastructure.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.