Hadoop source code analysis (mapreduce Introduction)

Source: Internet
Author: User
Tags shuffle hadoop mapreduce

From: http://caibinbupt.iteye.com/blog/336467


Everyone is familiar with file systems. Before analyzing HDFS, we didn't spend a lot of time introducing the background of HDFS. After all, you still have some understanding of file systems, there are also good documents. Before analyzing hadoop mapreduce, we should first understand how the system works, and then enter our Analysis Section. The following figure is from the example.



 

Take the wordcount in hadoop as an example (below is the startup line ):

Hadoop jars hadoop-0.19.0-examples.jar wordcount/usr/input/usr/Output

After the user submits a task, the task is coordinated by jobtracker. The map stage (M1, M2, and m3 in the figure) is executed first, and then the reduce stage (R1 and R2 in the figure) is executed ). The map and reduce actions are monitored by tasktracker and run on a Java virtual machine independent of tasktracker.

Both input and output are directories on HDFS (as shown in ). The input is described by the inputformat interface. Its implementation includes ASCII files and JDBC databases, which process data sources separately and provide some data features. Through the inputformat implementation, you can obtain the implementation of the inputsplit interface. This implementation is used to divide the data (from splite1 to splite5 in the figure, which is the result after division ), you can also obtain the implementation of the recordreader interface from inputformat and generate <K, V> pairs from the input. With <K, V>, you can start the map operation.

The map operation passes context. Collect (OutputCollector.Collect) write the result to context. When mapper outputs are collected, they are output to the output file in a specified way by the partitioner class. We can provide combiner for Mapper. When mapper outputs its <K, V>, key-value pairs are not immediately written to the output, they will be collected in the list (a key value and a list). When a certain number of key-value pairs are written, this part of the buffer will be merged in the combiner, then output to partitioner (the yellow color of M1 corresponds to combiner and partitioner ).

After the map action is completed, it enters the reduce stage. This phase involves three steps: shuffle, sort, and reduce.

In the mixed washing stage, the hadoop mapreduce framework will be based on the key in the map results, transmit the relevant results to a reducer (the intermediate results of the same key generated by multiple mappers are distributed on different machines. After this step is completed, they are all transferred to the Reducer Machine that processes this key ). In this step, the file transmission uses the HTTP protocol.

Sort and mixed wash are performed in one piece. In this phase, the <key, value> pairs with the same key value from different mappers are merged together.

In the reduce stage, the <key, (list of values)> obtained after shuffle and sort are sent to the CER Cer. Reduce method for processing. The output result is output to DFS through outputformat.

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.