[MapReduce] Google Troika: Gfs,mapreduce and BigTable

Source: Internet
Author: User

  Disclaimer: This article is reproduced from the blog Development team Blog, respect for the original work. This article is suitable for the study of distributed systems, as a background introduction to read.

When it comes to distributed systems, you have to mention Google's Troika: Google Fs[1],mapreduce[2],bigtable[3].

Although Google did not release the source code for the three products, he released detailed design papers for the three products. In addition, Yahoo-funded Hadoop also has the open source Java implementation of these three papers: Hadoop corresponds to MapReduce, Hadoop distributed File System (HDFS) corresponds to Google FS, HBase corresponds to BigTable. But Hadoop is a lot worse than Google in performance, see table 1.

Table 1:hbase and BigTable performance comparisons (from Http://wiki.apache.org/lucene-hadoop/Hbase/PerformanceEvaluation)

Experiment

HBase20070916

BigTable

Random Reads

272

1212

Random reads (MEM)

Not implemented

10811

Random writes

1460

8850

Sequential reads

267

4425

Sequential writes

1278

8547

Scans

3692

15385

These three products are described below:

1. Google FS

GFS is a scalable, distributed file system for large, distributed applications that access large amounts of data. It runs on inexpensive, common hardware and provides fault-tolerant functionality.

  

Figure 1 GFS Architecture

(1) The structure of GFs

1. The GFS structure is shown in Figure 1, consisting of a master and a large number of chunkserver,

2. Unlike Amazon Dynamo's no-master design, Google sets up a master to save directory and index information, which is designed to simplify system results and improve performance, but this can lead to a single point of failure or bottleneck. In order to eliminate the single point of failure of the main Google set a large number of each chunk (64M), so that because of the local code access data, application and master interaction will be reduced, The primary data traffic is access between application and Chunkserver.

3. In addition, all master information is stored in memory and the information is retrieved from Chunkserver at startup. Improves the performance and throughput of master, and it is easy for master to switch the backup J machine to master when it is dropped.

4. Both the client and Chunkserver do not cache file data separately, just use the Linux file system's own cache

"The master stores three major types of metadata:the file and chunk namespaces, the mapping from files to chunks, and the Locations of each chunk ' s replicas. "

"have a single master vastly simplifies we design and enables the master to make so phisticated chunk placement and replication decisions using global knowledge. However,we must minimize its involvement in reads and writes so, it does not become a bottleneck. Clients never read and write file data through the master. Instead, a client asks the master which chunkservers it should contact. It caches this information for a limited time and interacts with the chunkservers directly for many subsequent operations. The

"Neither the client nor the Chunkserver caches file data. Client caches offer little benefit because most applications stream through huge files or has working sets too large to B E cached. Not have them simplifies the client and the overall system by eliminating cache coherence issues. (Clients do cache metadata, however.) Chunkservers need not cache file data because chunks is stored as local files and so Linux ' s buffer cache already keeps F requently accessed data in memory. "

(2) Copy of GFs

GFS typically replicates to 3 machines, see Figure 2

Figure 21 Control flow and data flow for write operations

(3) External interface

similar to the file system, GFS provides create, Delete,open, close, read, and write operations externally. In addition, GFS has a new two interface snapshot and record Append,snapshot. Explanation of Snapshot:

"Moreover, GFS has snapshot and record append operations. Snapshot creates a copy of a file or a directory tree at low cost.

Record Append allows multiple clients to append data to the same file concurrently while guaranteeing the atomicity of EAC H individual client ' s append. "

2. MapReduce

MapReduce is a set of programming models for distributed parallel computing.

When it comes to parallel computing, you can't fail to talk about Microsoft's Herb Sutter article "The free Lunch is over:a fundamental Turn toward Concurrency in Software", which was published in 2005, the main meaning Is by increasing the CPU frequency of the way to improve the performance of the program will soon pass, the CPU design direction is mainly multicore, Hyper-threading and other concurrency. However, the previous program does not automatically get multi-core benefits, only by writing concurrent programs to truly gain the benefits of multicore. The same is true for distributed computing.

  

Figure 3 MapReduce Execution Overview

1) MapReduce is made up of map and reduce, which comes from Lisp,map, which distributes instructions to multiple workers, and reduce is the protocol that merges the results of the map's worker calculations. (See Figure 3)

2) Google's MapReduce implementation uses GFS to store data.

3) MapReduce can be used for distributed grep,count of URL Access frequency,reverseweb-link graph,distributed sort,inverted Index

3. Bigtable

Just as file systems require a database to store structured data, GFS also needs bigtable to store structured data.

1) BigTable is built on GFS, Scheduler, Lock Service and MapReduce.

2) Each table is a multidimensional sparse graph

3) in order to manage the huge table, the table is divided according to the row, these segmented data are collectively referred to as: Tablets. Each Tablets is about 100-200 MB, and each machine stores 100 or so Tablets. The underlying architecture is: GFS. Since GFS is a distributed file system, it is possible to get good load balancing after using the tablets mechanism. For example, you can move a frequently-responding table to another idle machine and then quickly rebuild it.

Reference Documents

[1]the Google File System; Http://labs.google.com/papers/gfs-sosp2003.pdf

[2]mapreduce:simplifed Data processing on Large Clusters; Http://labs.google.com/papers/mapreduce-osdi04.pdf

[3]bigtable:a distributed Storage System for structured data;http://labs.google.com/papers/bigtable-osdi06.pdf

[4]hadoop; http://lucene.apache.org/hadoop/

[5]hbase:bigtable-like structured storage for Hadoop Hdfs;http://wiki.apache.org/lucene-hadoop/hbase

[6]the free Lunch are over:a fundamental Turn toward Concurrency in software;http://www.gotw.ca/publications/concurrency- Ddj.htm

[MapReduce] Google Troika: Gfs,mapreduce and BigTable

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.