Source code analysis of FairScheduler job initialization process

Source: Internet
Author: User

The previous article () mentioned the submitJob () method in jobTracker. This method will eventually call listener. jobAdded (job) and register the Job to taskschedded for scheduling. Today, I will continue my research. In Hadoop, the default TaskScheduler is JobQueueTaskScheduler, which adopts the FIFO (first-in-first-out) Principle for scheduling, and FiarScheduler and CapacityTaskScheduler, but hadoop also adds them to the class library). These two classes can be found under the lib package under the hadoop directory, and the source code can be found under src/contrib. This article mainly explains FairScheduler.

As mentioned above, jobTracker registers the job to jobListener. Let's take a look at JobListener of FairScheduler.

1. fairScheduler. jobListener. addJob (): This method is relatively simple. JobSchedulable mapSched = ReflectionUtils. newInstance (conf. getClass ("mapred. jobtracker. jobSchedulable ", JobSchedulable. class, JobSchedulable. class), conf) Here, two JobSchedulable objects are obtained through reflection, that is, the default FairScheduler. jobSchedulable object, mapSched and redSched, and initialize JobSchedulable, which is relatively simple. Infos. put (job, info) adds a job to infos (stores all jobInPorgress objects) and adds the job to PoolScheduable. It obtains the corresponding pool based on the configured poolName. The following focuses on the update () method. Let's take a look at this method.

Public void jobAdded (JobInProgress job ){
Synchronized (FairScheduler. this ){
EventLog. log ("JOB_ADDED", job. getJobID ());
JobSchedulable mapSched = ReflectionUtils. newInstance (
Conf. getClass ("mapred. jobtracker. jobSchedulable", JobSchedulable. class,
JobSchedulable. class), conf );
MapSched. init (FairScheduler. this, job, TaskType. MAP );

JobSchedulable redSched = ReflectionUtils. newInstance (
Conf. getClass ("mapred. jobtracker. jobSchedulable", JobSchedulable. class,
JobSchedulable. class), conf );
RedSched. init (FairScheduler. this, job, TaskType. REDUCE );

JobInfo info = new JobInfo (mapSched, redSched );
Infos. put (job, info );
PoolMgr. addJob (job); // Also adds job into the right PoolScheduable
Update ();
}
}

2. fairScheduler. update (): Skip what you cannot understand and view poolMgr directly. reloadAllocsIfNecessary (), which is mainly used to read the configuration file (fair-scheduler.xml) of FairScheduler by mapred. fairscheduler. allocation. file parameter settings. Here, the configuration file is re-loaded based on the last modification time + ALLOC_RELOAD_INTERVAL. When the file is loaded, the xml file is simply read. Next, let's look at the update method. After loading the configuration file, it will traverse infos (saving all the jobInProgress of FairScheduler ), when traversing the job, the successful job, the failed job, and the kill job are removed from the pool. The next step is updateRunnability (). This method determines whether to start a job based on userMaxJob and the number of poolmaxjobs.

List <JobInProgress> toRemove = new ArrayList <JobInProgress> ();
For (JobInProgress job: infos. keySet ()){
Int runState = job. getStatus (). getRunState ();
If (runState = JobStatus. SUCCEEDED | runState = JobStatus. FAILED
| RunState = JobStatus. KILLED ){
ToRemove. add (job );
}
}
For (JobInProgress job: toRemove ){
JobNoLongerRunning (job );
}

3. fairschedility. updateRunnability (): In the first step, set all the remaining jobs in infos (successful and failed jobs are cleared during update) to notrunning. Next, sort the jobs in infos, Collections. sort (jobs, new sort ojobcomparator (). The sorting rule is a FIFO principle (strange, do not understand ). Then, traverse jobs and decide whether to add the submitted jobs to the task queue (that is, two lists) based on the submitted users of the job and the maximum number of submitted jobs of the submitted pool ), if the job status is RUNNING, jobinfo. running = true. If the job status is PREP (in preparation), initialize the job (note that only the jobs with job status = RUNNING and PREP are operated here ). JobInitializer. initJob (jobInfo, job) for job initialization. Here we use the threadPool of jdk (in fact, it is to add the thread to the thread pool, when will the thread pool absolutely execute it, in short, we will call the thread's run method). Let's take a look at the thread's run method. Call ttm. initJob (job) in the run method. Here ttm is jobTracker, And now return to jobTracker.

If (userCount <poolMgr. getUserMaxJobs (user )&&
PoolCount <poolMgr. getPoolMaxJobs (pool )){
If (job. getStatus (). getRunState () = JobStatus. RUNNING |
Job. getStatus (). getRunState () = JobStatus. PREP ){
UserJobs. put (user, userCount + 1 );
PoolJobs. put (pool, poolCount + 1 );
JobInfo jobInfo = infos. get (job );
If (job. getStatus (). getRunState () = JobStatus. RUNNING ){
JobInfo. runnable = true;
} Else {
// The job is in the PREP state. Give it to the job initializer
// For initialization if we have not already done it.
If (jobInfo. needsInitializing ){
JobInfo. needsInitializing = false;
JobInitializer. initJob (jobInfo, job );
}
}
}
}

For more details, please continue to read the highlights on the next page:

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

Build a Hadoop environment (using virtual machines to build two Ubuntu systems in a Winodws environment)

  • 1
  • 2
  • 3
  • 4
  • Next Page

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.