I. Overview.
MapReduce is a programming model that can be used for data processing. Hadoop can run mapreuce programs written in various languages. MapReduce is divided into the map section and the reduce section.
ii. Mechanisms of the MapReduce
MapReduce is divided into several major processes, input, Mapper, shufle, reduce, output
1. The input phase refers to copying the original file into HDFs.
2, through the mapper to deal with the desired key-value form of the target and then sorted, map is equivalent to the source data to be collated into the target data required data material. Remove the excess data. The main function of map is to decompose the task, and divide the complex and large number of tasks into several small tasks and assign them to each node for parallel computation.
3, Shufile to the data for a preprocessing
4, the reduce operation is the output of multiple maps, according to the need to merge, sort. The input key, value is processed and the desired data is output.
5. The output process is to store the data after the reduce operation in HDFs.
Iii. Summary
The role of MapReduce is equivalent to the ETL tool converting the original data into the target data. The data is extracted from the original data and then processed and sent to the target library as the target data.
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Hadoop--mapreduce Overview of "Big Data engineer Road"