Hadoop--mapreduce Overview of "Big Data engineer Road"

Source: Internet
Author: User

I. Overview.
MapReduce is a programming model that can be used for data processing. Hadoop can run mapreuce programs written in various languages. MapReduce is divided into the map section and the reduce section.
ii. Mechanisms of the MapReduce
MapReduce is divided into several major processes, input, Mapper, shufle, reduce, output
1. The input phase refers to copying the original file into HDFs.
2, through the mapper to deal with the desired key-value form of the target and then sorted, map is equivalent to the source data to be collated into the target data required data material. Remove the excess data. The main function of map is to decompose the task, and divide the complex and large number of tasks into several small tasks and assign them to each node for parallel computation.
3, Shufile to the data for a preprocessing
4, the reduce operation is the output of multiple maps, according to the need to merge, sort. The input key, value is processed and the desired data is output.
5. The output process is to store the data after the reduce operation in HDFs.


Iii. Summary

The role of MapReduce is equivalent to the ETL tool converting the original data into the target data. The data is extracted from the original data and then processed and sent to the target library as the target data.

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Hadoop--mapreduce Overview of "Big Data engineer Road"

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.