Hadoop--mapreduce Fundamentals

Source: Internet
Author: User

MapReduce is the core framework for completing data computing tasks in Hadoop

1. MapReduce constituent Entities

(1) Client node: The MapReduce program and the Jobclient instance object are run on this node, and the MapReduce job is submitted.

(2) Jobtracker: Coordinated scheduling, master node, one Hadoop cluster with only one jobtracker node

(3) Map Tasktracker: Perform map task, a Hadoop cluster has multiple tasktracker nodes

(4) Reduce tasktracker: Perform a Reduce task, a Hadoop cluster has multiple tasktracker nodes

(5) HDFS, storing data files, configuration files

2. MapReduce Job Flow

(1) Job start

(2) Job initialization

(3) Job/task scheduling

(4) Map execution

(5) Shuffle

(6) Reduce execution

(7) Job completion

3. Work Flow Distribution Explained

(1) Job start: A mapreduce program is run by the client node to create a jobclient instance

Jobclient sends a request to Jobtracker to obtain a jobid that identifies this mapreduce job

Jobclient The associated resources (configuration file, number of input data shards, jar files containing mapper classes and reducer classes) that are required to run the job into the appropriate HDFs directory for the job, calculate the number of shards and MA P Task number ↓ to Jobtracker Submit the job and get the status object handle for the job

Hadoop--mapreduce Fundamentals

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.