Hadoop MapReduce Design Pattern Learning Notes

Source: Internet
Author: User
Tags hadoop mapreduce

Before using MapReduce to solve any problem, we need to consider how to design it. Map and reduce jobs are not required at all times.

1 MapReduce design mode (MapReduce)
1.1 Input-map-reduce-output
1.2 Input-map-output
1.3 Input-multiple Maps-reduce-output
1.4 Input-map-combiner-reduce-output

MapReduce design mode (MapReduce)

The whole mapreduce operation stage can be divided into the following four kinds:
1, Input-map-reduce-output

2, Input-map-output

3, Input-multiple Maps-reduce-output

4, Input-map-combiner-reduce-output
I'll explain in one of these scenarios which design pattern to use.
Input-map-reduce-output

Input➜map➜reduce➜output

If we need to do some aggregation operations (aggregation), we need to use this pattern.
Scenario calculates the average salary of each sex employee
Map (Key, Value) key:gender
Value:their Salary
Reduce GROUP by to gender and calculate the total salary for each sex
Input-map-output

Input➜map➜output

If we just want to change the format of the input data, we can use this pattern at this time.
The scene deals with gender
Map (Key, Value) Key:employee Id
Value:gender->
If Gender is female/f/f/0 then converted to F
else if Gender is MALE/M/M/1 then convert to M
Input-multiple Maps-reduce-output

Input1➜map1➘
Reduce➜output
Input2➜map2➚

In this design pattern, we have two input files whose files are in different formats,
The format of file one is the prefix of the gender as the name, for example: Ms. Shital Katkar or Mr. Krishna Katkar
The format of document two is a gender format is fixed, but its position is not fixed, such as Female/male, 0/1, f/m
The scene deals with gender
Map (Key, value) Map 1 (for input 1): We need to separate the gender from the name, and then to determine the sex according to the prefix, then get (gender,salary) key value pairs;
Map 2 (For input 2): This case program is written more directly, handles the fixed format of the gender, and then gets (gender,salary) the key value pairs.
Reduce GROUP by to gender and calculate the total salary for each sex
Input-map-combiner-reduce-output

Input➜map➜combiner➜reduce➜output

In MapReduce, combiner is also called reduce, which receives the output of the map side as its input and outputs the Key-value key value pair as the input to reduce. The purpose of combiner is to reduce the load that data is passed to reduce.

In the MapReduce program, 20% of the work is performed in the map phase, which is also the preparation phase of the data, and the work is done in parallel at all stages.

80% of the work is performed in the reduce phase, which is a phase of computation that is not parallel. Therefore, the secondary phase is generally more than the map phase. To save time, some of the work handled during the reduce phase can be completed in the combiner phase.

Assuming we have 5 departments (departments), we need to calculate the total salary for the individual. But the rules for calculating wages are a little odd, such as the total salary of a certain sex is greater than 200k, then the total salary of this sex needs to be added 20k, and if the total salary of a sex is greater than 100k, then the total salary of this sex needs to be added 10k. As follows:
Map phase:
Dept 1:male<10,20,25,45,15,45,25,20>,female <10,30,20,25,35>
Dept 2:male< 15,30,40,25,45>,female <20,35,25,35,40>
Dept 3:male<10,20,20,40>,female <10,30,25,70>
Dept 4:male<45,25,20>,female <30,20,25,35>
Dept 5:male<10,20>,female <10,30,20,25,35>
 
combiner phase:
Dept 1:male <250,20>,female <120,10>
Dept 2:male <155,10>,female & Lt;175,10>
Dept 3:male <90,00>,female <135,10>
Dept 4:male <90,00>,female <110,10>
Dept 5:male <30,00>,female <130,10>
 
Reduce phase:
male< 250,20,155,10,90,90,30>, Female<120,10,175,10,135,10,110,10,130,10>
 
Output:
Male<645>,female<720>

The above four kinds of mapreduce modes are just basic, we can design different design patterns according to our own problems

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.