Hadoop Learning Note four--jobtracker execution process

Source: Internet
Author: User

The implementation of MapReduce in Hadoop is also based on the Master/slave master-slave structure. Jobtracker played the role of Master, and Tasktracker played the role of slave. Master is responsible for accepting the job submitted by the client and then dispatching each subtask task on the job to run on slave and monitor them. If all failed tasks are found to run again, slave is responsible for executing each task directly.

When Hadoop starts, Jobtracker is run as a single JVM. Jobtracker will always wait for jobclient to submit the job via RPC, which dispatches each task that handles Jobclient commits and monitors their operation. When a failed task is found, Jobtracker will re-execute it. And Tasktracker will always send a heartbeat to Jobtracker, asking Jobtracker if there is a task to deal with.

When a job is submitted from jobclient to Jobtracker, the process for jobtracker to process the jobs submitted by Jobclient is as follows:

I'll briefly cover a few of these sections:

1. Create a job jobinprogress

The Jobinprogress object details the job configuration information and how it is executed, specifically the map and reduce tasks that the job is decomposed into. During the creation of the Jobinprogress object, it mainly did two things, one is to copy the job's job.xml, Job.jar files from the job directory to the Jobtracker local file system (job.xml->*/jobtracker/ Jobid.xml,job.jar->*/jobtracker/jobid.jar), the second is to create jobstatus and job Maptask, reducetask the queue to track the status information of the job.

2. Check whether the client has permission to submit the job

Jobtracker Verify that the client has permission to submit the job is actually given to QueueManager to handle, and I will write a blog post detailing this issue about how QueueManager validates the client's operations with the job.

3. Check that the current MapReduce cluster meets the job's memory requirements

Before the client submits the job, the memory requirements of the job task are configured according to the actual application, and jobtracker in order to increase the throughput of the job to limit the memory requirements of the job task, so when the job is submitted, Jobtracker needs to check if the job's memory requirements meet Jobtracker's settings.

The detailed code is as follows:

Private voidCheckmemoryrequirements (jobinprogress Job)throwsIOException {if(!PERTASKMEMORYCONFIGURATIONSETONJT ()) {Log.debug ("Per-task memory configuration is not set to JT." + "not checking the job for invalid memory requirements."); return; }    BooleanInvalidjob =false; String msg= ""; LongMaxmemformaptask = job.getjobconf (). Getmemoryformaptask ();//get the job's map task memory requirements    LongMaxmemforreducetask = job.getjobconf (). Getmemoryforreducetask ();//get Job's reduce task memory requirements    if(Maxmemformaptask = = Jobconf.disabled_memory_limit | | maxmemforreducetask = =jobconf.disabled_memory_limit) {Invalidjob=true; Msg= "Invalid job requirements."; }    if(Maxmemformaptask > Limitmaxmemformaptasks | | maxmemforreducetask >limitmaxmemforreducetasks) {Invalidjob=true; Msg= "exceeds the cluster ' s max-memory-limit."; }    if(invalidjob) {StringBuilder jobstr=NewStringBuilder (). Append (Job.getjobid (). toString ()). Append ("("). Append (Maxmemformaptask). Append ("Memformaptasks"). Append (Maxmemforreducetask). Append ("Memforreducetasks):"); Log.warn (jobstr.tostring ()+msg); Throw NewIOException (jobstr.tostring () +msg); }  }

The Jobtracker global variables limitmaxmemformaptasks and limitmaxmemforreducetasks can be set through the configuration file, They correspond to the MAPRED.CLUSTER.MAX.MAP.MEMORY.MB and MAPRED.CLUSTER.MAX.REDUCE.MEMORY.MB items in the configuration file, respectively.

Reference----http://www.linuxidc.com/Linux/2012-01/50860.htm

Hadoop Learning Note four--jobtracker execution process

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.