Wang Jialin's 11th lecture on hadoop graphic training course: Analysis of the Principles, mechanisms, and flowcharts of mapreduce in "the path to a practical master of cloud computing distributed Big Data hadoop-from scratch"

This section mainly analyzes the principles and processes of mapreduce.


You must at least know the following points about mapreduce:

1. mapreduce runs on a distributed file system. In hadoop, mapreduce runs on HDFS;

2. mapreduce is mainly used for parallel operations on large-scale data. This type of big data refers to 1 TB or more;

3. The principle of mapreduce is to cut a large task into many small tasks running in parallel, and then combine the results after the small tasks are run to get the final result;

Is the mapreduce running process diagram:

