MapReduce is the core framework for completing data computing tasks in Hadoop
1. MapReduce constituent Entities
(1) Client node: The MapReduce program and the Jobclient instance object are run on this node, and the MapReduce job is submitted.
(2) Jobtracker: Coordinated scheduling, master node, one Hadoop cluster with only one jobtracker node
(3) Map Tasktracker: Perform map task, a Hadoop cluster has multiple tasktracker nodes
(4) Reduce tasktracker: Perform a Reduce task, a Hadoop cluster has multiple tasktracker nodes
(5) HDFS, storing data files, configuration files
2. MapReduce Job Flow
(1) Job start
(2) Job initialization
(3) Job/task scheduling
(4) Map execution
(5) Shuffle
(6) Reduce execution
(7) Job completion
3. Work Flow Distribution Explained
(1) Job start: A mapreduce program is run by the client node to create a jobclient instance
↓
Jobclient sends a request to Jobtracker to obtain a jobid that identifies this mapreduce job
↓
Jobclient The associated resources (configuration file, number of input data shards, jar files containing mapper classes and reducer classes) that are required to run the job into the appropriate HDFs directory for the job, calculate the number of shards and MA P Task number ↓ to Jobtracker Submit the job and get the status object handle for the job
Hadoop--mapreduce Fundamentals