Hadoop uses Jobcontrol to set dependencies between jobs

Source: Internet
Author: User

Jobcontrol Programming Example:


A complete Bayesian classification algorithm may require 4 dependent mapreduce jobs to be completed, and the traditional practice is to create the appropriate Jobconf object for each job and submit the individual jobs sequentially (serially), as follows:
Create a Jobconf object for 4 jobs, respectively
jobconf extractjobconf = new jobconf (extractjob.class);
jobconf classpriorjobconf = new jobconf (classpriorjob.class);
jobconf conditionalprobilityjobconf = new jobconf (conditionalprobilityjob.class);
jobconf predictjobconf = new jobconf (predictjob.class);
...//Configure each jobconf
Submit jobs in succession by dependency
Jobclient.runjob (extractjobconf);
Jobclient.runjob (classpriorjobconf);
Jobclient.runjob (conditionalprobilityjobconf);

Jobclient.runjob (predictjobconf);


If you use Jobcontrol, the user simply adds the job dependency interface using the adddepending () function, and Jobcontrol dispatches the individual jobs according to the dependency, as follows:
Configuration extractjobconf = new configuration ();
Configuration classpriorjobconf = new configuration ();
Configuration conditionalprobilityjobconf = new configuration ();
Configuration predictjobconf = new configuration ();
...//Set individual configuration
Create a Job object. Note that the Jobcontrol requires that the job must be encapsulated as a jobs object
Job Extractjob = new Job (extractjobconf);
Job Classpriorjob = new Job (classpriorjobconf);
Job Conditionalprobilityjob = new Job (conditionalprobilityjobconf);
Job Predictjob = new Job (predictjobconf);
Set up dependencies to construct a DAG job
Classpriorjob.adddepending (Extractjob);
Conditionalprobilityjob.adddepending (Extractjob);
Predictjob.adddepending (Classpriorjob);
Predictjob.adddepending (Conditionalprobilityjob);
Create a Jobcontrol object that monitors and dispatches jobs
Jobcontrol JC = new Jobcontrol ("Native Bayes");
Jc.addjob (extractjob);//Add 4 jobs to Jobcontrol
Jc.addjob (Classpriorjob);
Jc.addjob (Conditionalprobilityjob);
Jc.addjob (Predictjob);
Jc.run (); Submit a DAG Job

In the actual running process, the extractjob that do not depend on any other job will be prioritized, once run is completed, Classpriorjob and conditionalprobilityjob two jobs are dispatched simultaneously, after they all run to complete, Predictjob is dispatched.


Comparing these two scenarios, you can get a simple conclusion: using Jobcontrol to write Dag jobs is easier and enables multiple non-dependent jobs to run in parallel.

From the inside of Hadoop technology-in-depth analysis of mapreduce framework design and implementation principles

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.