Source code analysis of FairScheduler job initialization process

Last Update:2014-05-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The previous article () mentioned the submitJob () method in jobTracker. This method will eventually call listener. jobAdded (job) and register the Job to taskschedded for scheduling. Today, I will continue my research. In Hadoop, the default TaskScheduler is JobQueueTaskScheduler, which adopts the FIFO (first-in-first-out) Principle for scheduling, and FiarScheduler and CapacityTaskScheduler, but hadoop also adds them to the class library). These two classes can be found under the lib package under the hadoop directory, and the source code can be found under src/contrib. This article mainly explains FairScheduler.

As mentioned above, jobTracker registers the job to jobListener. Let's take a look at JobListener of FairScheduler.

1. fairScheduler. jobListener. addJob (): This method is relatively simple. JobSchedulable mapSched = ReflectionUtils. newInstance (conf. getClass ("mapred. jobtracker. jobSchedulable ", JobSchedulable. class, JobSchedulable. class), conf) Here, two JobSchedulable objects are obtained through reflection, that is, the default FairScheduler. jobSchedulable object, mapSched and redSched, and initialize JobSchedulable, which is relatively simple. Infos. put (job, info) adds a job to infos (stores all jobInPorgress objects) and adds the job to PoolScheduable. It obtains the corresponding pool based on the configured poolName. The following focuses on the update () method. Let's take a look at this method.

Public void jobAdded (JobInProgress job ){
Synchronized (FairScheduler. this ){
EventLog. log ("JOB_ADDED", job. getJobID ());
JobSchedulable mapSched = ReflectionUtils. newInstance (
Conf. getClass ("mapred. jobtracker. jobSchedulable", JobSchedulable. class,
JobSchedulable. class), conf );
MapSched. init (FairScheduler. this, job, TaskType. MAP );

JobSchedulable redSched = ReflectionUtils. newInstance (
Conf. getClass ("mapred. jobtracker. jobSchedulable", JobSchedulable. class,
JobSchedulable. class), conf );
RedSched. init (FairScheduler. this, job, TaskType. REDUCE );

JobInfo info = new JobInfo (mapSched, redSched );
Infos. put (job, info );
PoolMgr. addJob (job); // Also adds job into the right PoolScheduable
Update ();
}
}

2. fairScheduler. update (): Skip what you cannot understand and view poolMgr directly. reloadAllocsIfNecessary (), which is mainly used to read the configuration file (fair-scheduler.xml) of FairScheduler by mapred. fairscheduler. allocation. file parameter settings. Here, the configuration file is re-loaded based on the last modification time + ALLOC_RELOAD_INTERVAL. When the file is loaded, the xml file is simply read. Next, let's look at the update method. After loading the configuration file, it will traverse infos (saving all the jobInProgress of FairScheduler ), when traversing the job, the successful job, the failed job, and the kill job are removed from the pool. The next step is updateRunnability (). This method determines whether to start a job based on userMaxJob and the number of poolmaxjobs.

List <JobInProgress> toRemove = new ArrayList <JobInProgress> ();
For (JobInProgress job: infos. keySet ()){
Int runState = job. getStatus (). getRunState ();
If (runState = JobStatus. SUCCEEDED | runState = JobStatus. FAILED
| RunState = JobStatus. KILLED ){
ToRemove. add (job );
}
}
For (JobInProgress job: toRemove ){
JobNoLongerRunning (job );
}

3. fairschedility. updateRunnability (): In the first step, set all the remaining jobs in infos (successful and failed jobs are cleared during update) to notrunning. Next, sort the jobs in infos, Collections. sort (jobs, new sort ojobcomparator (). The sorting rule is a FIFO principle (strange, do not understand ). Then, traverse jobs and decide whether to add the submitted jobs to the task queue (that is, two lists) based on the submitted users of the job and the maximum number of submitted jobs of the submitted pool ), if the job status is RUNNING, jobinfo. running = true. If the job status is PREP (in preparation), initialize the job (note that only the jobs with job status = RUNNING and PREP are operated here ). JobInitializer. initJob (jobInfo, job) for job initialization. Here we use the threadPool of jdk (in fact, it is to add the thread to the thread pool, when will the thread pool absolutely execute it, in short, we will call the thread's run method). Let's take a look at the thread's run method. Call ttm. initJob (job) in the run method. Here ttm is jobTracker, And now return to jobTracker.

If (userCount <poolMgr. getUserMaxJobs (user )&&
PoolCount <poolMgr. getPoolMaxJobs (pool )){
If (job. getStatus (). getRunState () = JobStatus. RUNNING |
Job. getStatus (). getRunState () = JobStatus. PREP ){
UserJobs. put (user, userCount + 1 );
PoolJobs. put (pool, poolCount + 1 );
JobInfo jobInfo = infos. get (job );
If (job. getStatus (). getRunState () = JobStatus. RUNNING ){
JobInfo. runnable = true;
} Else {
// The job is in the PREP state. Give it to the job initializer
// For initialization if we have not already done it.
If (jobInfo. needsInitializing ){
JobInfo. needsInitializing = false;
JobInitializer. initJob (jobInfo, job );
}
}
}
}

For more details, please continue to read the highlights on the next page:

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

Build a Hadoop environment (using virtual machines to build two Ubuntu systems in a Winodws environment)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Source code analysis of FairScheduler job initialization process

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Source code analysis of FairScheduler job initialization process

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support