MapReduce Execution Process

Last Update:2015-04-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Mapper The execution process of the task:

The first stage is the input file according to a certain standard Shard (inputsplit), the size of each input piece is fixed. By default, the size of the input slice (inputsplit) is the same as the size of the data block (block). If the size of the block (block) is the default value of 64MB, the input file has two, one is 32MB, and the other is 72MB. So the small file is an input piece, the large file will be divided into two pieces of data, then two input pieces. Altogether three input slices are produced. Each input piece consists of a Mapper process processing . Here are three input pieces, there will be three mapper process processing.
The second stage is to parse the records in the input slices into key-value pairs according to certain rules. A default rule is to parse each line of text into a key-value pair. The "key" is the starting position (in bytes) of each row, and the value is the text content of the bank.
The third stage is to call the map method in the Mapper class. The second phase resolves each key-value pair, calling a map method. If there are 1000 key-value pairs, the map method is called 1000 times. Each time the map method is called, 0 or more key-value pairs are output.
The Forth stage is to partition the key-value pairs of the third stage output according to certain rules. The comparison is based on the key. For example, our key indicates provinces (such as Beijing, Shanghai, Shandong, etc.), then can be divided according to different provinces, the same province of the key-value pairs into a region. The default is only one zone . the number of partitions is Reducer the number of task runs . There is only one reducer task by default.
The fifth stage is to sort the key-value pairs in each partition. First, sort by key, and for key-value pairs with the same key, sort by value. For example, three key values for <2,2>, <1,3>, <2,1>, and keys and values are integers respectively. Then the result of sorting is <1,3>, <2,1>, <2,2>. If there is a sixth stage, then the sixth stage is entered, and if not, the output is directly to the local Linux file.
The sixth stage is the processing of data, that is, reduce processing. Key -value pairs with equal keys are called once Reduce method . By this stage, the amount of data will be reduced. The data is output to a local Linxu file. This stage is not the default and requires the user to add the code for this phase themselves .

Reducer the execution process of a task

The first stage is that the reducer task will proactively replicate its output key-value pairs from the mapper task. Mapper tasks can be many, so reducer copies the output of multiple mapper.
The second stage is to merge all the data that is copied into the reducer, merging the scattered data into one large data. Then sort the merged data.
The third stage is to call the reduce method on the sorted key-value pair. Key -value pairs with equal keys are called once Reduce method , each call produces 0 or more key-value pairs. Finally, these output key-value pairs are written to the HDFs file.
Throughout the development of the MapReduce program, the greatest effort was to override the map function and overwrite the reduce function.

MapReduce Execution Process

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

MapReduce Execution Process

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support