Jobcontrol Programming Example:
A complete Bayesian classification algorithm may require 4 dependent mapreduce jobs to be completed, and the traditional practice is to create the appropriate Jobconf object for each job and submit the individual jobs sequentially (serially), as follows:
Create a Jobconf object for 4 jobs, respectively
jobconf extractjobconf = new jobconf (extractjob.class);
jobconf classpriorjobconf = new jobconf (classpriorjob.class);
jobconf conditionalprobilityjobconf = new jobconf (conditionalprobilityjob.class);
jobconf predictjobconf = new jobconf (predictjob.class);
...//Configure each jobconf
Submit jobs in succession by dependency
Jobclient.runjob (extractjobconf);
Jobclient.runjob (classpriorjobconf);
Jobclient.runjob (conditionalprobilityjobconf);
Jobclient.runjob (predictjobconf);
If you use Jobcontrol, the user simply adds the job dependency interface using the adddepending () function, and Jobcontrol dispatches the individual jobs according to the dependency, as follows:
Configuration extractjobconf = new configuration ();
Configuration classpriorjobconf = new configuration ();
Configuration conditionalprobilityjobconf = new configuration ();
Configuration predictjobconf = new configuration ();
...//Set individual configuration
Create a Job object. Note that the Jobcontrol requires that the job must be encapsulated as a jobs object
Job Extractjob = new Job (extractjobconf);
Job Classpriorjob = new Job (classpriorjobconf);
Job Conditionalprobilityjob = new Job (conditionalprobilityjobconf);
Job Predictjob = new Job (predictjobconf);
Set up dependencies to construct a DAG job
Classpriorjob.adddepending (Extractjob);
Conditionalprobilityjob.adddepending (Extractjob);
Predictjob.adddepending (Classpriorjob);
Predictjob.adddepending (Conditionalprobilityjob);
Create a Jobcontrol object that monitors and dispatches jobs
Jobcontrol JC = new Jobcontrol ("Native Bayes");
Jc.addjob (extractjob);//Add 4 jobs to Jobcontrol
Jc.addjob (Classpriorjob);
Jc.addjob (Conditionalprobilityjob);
Jc.addjob (Predictjob);
Jc.run (); Submit a DAG Job
In the actual running process, the extractjob that do not depend on any other job will be prioritized, once run is completed, Classpriorjob and conditionalprobilityjob two jobs are dispatched simultaneously, after they all run to complete, Predictjob is dispatched.
Comparing these two scenarios, you can get a simple conclusion: using Jobcontrol to write Dag jobs is easier and enables multiple non-dependent jobs to run in parallel.
From the inside of Hadoop technology-in-depth analysis of mapreduce framework design and implementation principles