Topic Center

Contact Sales

Home > Others

Introduction to map-Reduce

Last Update:2018-12-03 Source: Internet

Author: User

Tags new set

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Mapreduce is a programming model that begins with Dean, Jeffrey & Ghemawat, Sanjay (2004). "mapreduce: simplified data processing on large clusters ". It is mainly used for parallel operations on large-scale datasets. It simplifies parallel computing into the process of MAP and reduce, which greatly facilitates programmers to run their programs on the distributed system without distributed parallel programming. The programmer only needs to specify a map function to map a set of key-value pairs into a new set of key-value pairs, and then specify the concurrent reduce function, it is used to ensure that all key-value pairs mapped share the same key group.

Mapreduce is rooted in map and reduce functions in functional programming. It consists of two operations that may contain many instances (many map and reduce. The map function accepts a set of data and converts it to a list of key/value pairs. Each element in the input field corresponds to a key/value pair. Reduce functions accept the list generated by the map function, and then narrow down the list of key/value pairs based on their keys (generate a key/value pair for each key. The process concept diagram is as follows:

A typical map-reduce process is as follows:

Input-> map-> patition-> reduce-> output

Input phase

Input data must be transmitted to mapper in a certain format. There are many formats, and data is generally distributed across multiple machines.

MAP Phase

Processes the input data and outputs a set of keys and values.

Partition phase

Divides the intermediate results output by mapper tasks into r copies in the range of keys (r is the number of pre-defined reduce tasks). The default partitioning algorithm is "(key. hashcode () & integer. max_value) % numpartitions ", which ensures that keys in a range must be processed by a certain CER Cer.

Reduce phase

Reducer obtains the intermediate results of mapper output and processes a key range as input.

Output phase

The reducer output format corresponds to the Mapper input format. Of course, the reducer output can be processed as another mapper input.

Advantages and disadvantages of mapreduce:

There are two main aspects:
1. the Distributed Processing Framework of mapreduce not only can be used to process large-scale data, but also can hide a lot of complicated details, such as automatic parallelization, Server Load balancer, and disaster recovery management, this will greatly simplify the development of programmers;
2. mapreduce is highly scalable. That is to say, every time a server is added, it can connect almost the same computing capability to the cluster. Most of the distributed processing frameworks in the past have, it is far from mapreduce in terms of scalability. The biggest disadvantage of mapreduce is that it does not meet the needs of real-time applications.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

introduction to command line introduction to ftp introduction to gmp introduction to library pdf introduction to sql introduction to mysql wikipedia introduction to net pdf

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Introduction to map-Reduce

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support