Introduction to hadoop mapreduce job Process

Source: Internet
Author: User
Tags hadoop mapreduce

What is a complete mapreduce job process? I believe that beginners who are new to hadoop and who are new to mapreduce have a lot of troubles. The figure below is from idea.

 

ToThe wordcount in hadoop is used as an example (the startup line is shown below ):

 

 
Hadoop jars hadoop-0.19.0-examples.jar wordcount/usr/input/usr/Output

After you submit a taskJobtracker coordinates, first executes the MAP Phase (M1, M2, and m3 in the figure), and then executes the reduce phase (R1 and R2 in the figure ).

The map and reduce actions are monitored by tasktracker and run on a Java virtual machine independent of tasktracker.

both input and output are directories on HDFS, as shown in ). The input is described by the inputformat interface. Its implementation includes ASCII files and JDBC databases, which process data sources separately and provide some data features. Through the inputformat implementation, you can obtain the implementation of the inputsplit interface. This implementation is used to divide the data (from splite1 to splite5 in the figure, which is the result after division ), you can also obtain the implementation of the recordreader interface from inputformat and generate pairs from the input. With , you can start the map operation.

The map operation passes context. Collect (Outputcollector.Collect) write the result to context. When mapper outputs are collected, they are output to the output file in a specified way by the partitioner class. We can provide combiner for Mapper. When mapper outputs its <K, V>,Key-value pairs are not immediately written to the output and are collected inList(Key Value: A list). When a certain number of key-value pairs are written, the buffer is merged by the combiner, then output to partitioner (the yellow color of M1 corresponds to combiner and partitioner ).

After the map action is completed, it enters the reduce stage. This phase involves three steps: shuffle, sort, and reduce.

Mixed washing stage,The hadoop mapreduce framework transmits the relevant results to a reducer based on the keys in the map results (the intermediate results of the same key generated by multiple mapper are distributed on different machines, after this step is completed, they are all transferred to the Reducer Machine that processes the key ). In this step, the file transmission uses the HTTP protocol.

Sorting and mixed washing are performed in one step. This stage will come from differentThe <key, value> pairs of mapper with the same key value are merged together.

In the reduce stage, the it is sent to Cer. in the reduce method, output results are output to DFS through outputformat.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.