Topic Center

Contact Sales

Home > Internet > Online Trends

Advantages and disadvantages of mapreduce distributed processing framework

Last Update:2015-03-17 Source: Internet

Author: User

Keywords Distributed processing very pros and cons

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In Google data centers there are large numbers of data to be processed, such as a lot of Web pages crawled by web crawlers (WebCrawler). Since many of these data are PB levels, the process has to be as parallel as possible, and Google has introduced the MapReduce distributed processing framework to address this problem.

　　

Technology Overview

　　

MapReduce itself is derived from functional languages, mainly through "map" and "Reduce (simplify)" These two steps to parallel processing large-scale data sets. First, map will first perform the specified action on each element of a logical list of many independent elements, and the original list will not be changed, creating multiple new lists to hold the processing results of the map. Also means that the map operation is highly parallel. When the map is finished, the system then cleans up (Shuffle) and sorts the newly generated lists, and then the newly created lists reduce, which is the appropriate merging of the elements of a list against the key value. The following figure is the operating mechanism of MapReduce:

　　

Next, a mapreduce example will be used to help you understand this: for example, through the search engine crawler (Spider), a massive web page is crawled from the internet to the local distributed file system, The index system will then perform a parallel map processing of the massive web pages stored in the Distributed file system, generating multiple key-value pairs (key-valuemap) for the Url,value HTML page, and then the system will shuffle these newly generated key-value pairs ( Cleanup), the system then uses the reduce operation to merge the key value pairs based on the same key value (that is, the URL).

　　

points

　　

Speaking of the advantages of MapReduce, there are two main aspects: first, through the mapreduce of this distributed processing framework, not only can be used to deal with large-scale data, but also can hide a lot of tedious details, such as automatic parallelization, load balancing and disaster preparedness management, This will greatly simplify the programmer's development work; second, the MapReduce is very scalable, that is to say, each additional server, it will be able to connect almost the computing power into the cluster, and the past most of the distributed processing framework, in terms of scalability and mapreduce far. The biggest disadvantage of MapReduce is that it does not adapt to real-time application requirements, so in Google's latest real-time caffeine search engine, MapReduce's dominant position has been available for real-time processing percolator system, its specific details, will be covered in the next article in this series.

　　

Related Products

　　

In addition to Google's internal use of MapReduce, as well as the Yahoo team led by Lucene's father Dougcutting development, Apache-managed MapReduce Open source version of Hadoop, and once launched, is greatly welcomed by the industry, and derived from HDFs, zookeeper, Hbase, hive and pig and other products.

　　

actual use case

　　

in the actual work environment, mapreduce this distributed processing framework is often used in distributed grep, distributed sorting, Web Access log analysis, reverse indexing, document clustering, machine learning, data analysis, Based on statistical machine translation and the formation of the entire search engine index and other large-scale data processing work, and has been in many domestic well-known internet companies have been greatly applied inside, such as Baidu and Taobao.

　　

Finally, if you're interested in MapReduce, you can download it and try it on the official Hadoop site.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Getting Started with CDN 12-02

Front-end Must Learn: CDN Acceleration Principle 12-02

Elements of CDN Network 12-01

Understand the Principle of CDN Acceleration in One Article 12-01

Cloud Security Issues Derived from the Development of Cloud C... 11-26

8 New Types of Attacks Facing the Cloud Environment 11-26

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

Hot Article

Hot Tags

computing conference access forum computer class data get http html applications

Popular Keywords

html add blank space register business logo register ssl certificate full site sign in sign up node js build cloud register register a subdomain in python network management system tutorial how to learn computer science by myself

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Advantages and disadvantages of mapreduce distributed processing framework

Contact Us

Hot Article

Hot Tags

Popular Keywords

Recommend Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support