First, introduce
Oozie is a Hadoop-based workflow Scheduler that can submit different types of jobs programmatically through the Oozie Client, such as mapreduce jobs and spark jobs to the underlying computing platform, such as Cloudera Hadoop.
Quartz is an open-source scheduling software that provides a variety of triggers and listeners for scheduling execution of tasks
The following uses Quartz + Oozie to submit a MapReduce program to Cloudera Hadoop execution
Second, the idea of dispatching
Why should ① use quartz? Mainly with the help of quartz powerful trigger function. It allows different scheduling requirements to be met, such as how many times a job is executed once a week and the job is repeatedly executed. Here's an important question: Suppose I have a job that needs to be repeated, and when the job is submitted to CDH for the first time, it is no longer time to upload the job to CDH and execute it, but to record the submitted job, and the next time it needs to run, Just let CDH run the job again.
② the advantage of using quartz is that you can do some control when the job is submitted. For example, if a job of a certain type is submitted at a very high frequency, or if the run time is short (judging by what it did last time), then the next time you run it, let it have a higher priority.
The purpose of ③ using Oozie is to let it send jobs to the underlying computing platform, such as CDH, to execute the job.
Third, the Eclipse development environment is built
It is primarily a dependency package that requires quartz and oozie. Specific as follows:
Four, realize the idea
A) The scheduling system currently only considers scheduling two types of jobs: MapReduce jobs and spark jobs. First of these two kinds of jobs passed quartz to Oozie, and then let Oozie to submit to CDH computing platform to execute.
b) Quartz provides a common job interface. There is only one execute () method, which is responsible for accomplishing the specific functions of the quartz scheduled job: To pass the job to Oozie
c) define an abstract class Basejob, which defines two methods. These two methods are mainly used to do some preparatory work, that is, when you pass the job to Oozie using quartz, you need to find the directory where the job is stored in HDFs and copy it to the execution directory.
D) Finally, there are two specific implementation classes, Mrjob and Sparkjob, which represent the mapreduce job and the spark job, respectively. Complete the configuration of the job in the implementation class, and then submit the job to the CDH compute platform for execution.
The related class diagram is as follows:
Five, specific code analysis
Mrjob.java
Execute () of the Org.quartz.Job interface is implemented, and the method is automatically executed by Quartz Schedule when the trigger is triggered. This allows you to define triggers as needed to control when jobs are submitted to Oozie.
1 @Override2 Public voidExecute (jobexecutioncontext arg0)throwsjobexecutionexception {3 Try{4String jobId =wc.run (conf);5System.out.println ("Workflow Job submitted");//submit job to Oozie and get the jobId6 7 //wait until the workflow job finishes8 while(Wc.getjobinfo (jobId). GetStatus () = =status.running) {9System.out.println ("Workflow Job running ...");Ten Try{ OneThread.Sleep (10*1000); A}Catch(interruptedexception e) {e.printstacktrace ();} - } -System.out.println ("Workflow Job completed!"); the System.out.println (Wc.getjobid (jobId)); -}Catch(oozieclientexception e) {e.printstacktrace ();} -}
The main function of the test is as follows: You can see that for a client, you can debug a mapreduce job just by writing a regular quartz job. To run the program, of course, you have to be prepared in advance to the operating environment of the job. Specific reference
1 Import StaticOrg.quartz.JobBuilder.newJob;2 Import StaticOrg.quartz.TriggerBuilder.newTrigger;3 4 Importjava.util.Date;5 6 ImportOrg.quartz.JobDetail;7 ImportOrg.quartz.Scheduler;8 Importorg.quartz.SchedulerFactory;9 ImportOrg.quartz.SimpleTrigger;Ten Importorg.quartz.impl.StdSchedulerFactory; One ImportOrg.slf4j.Logger; A Importorg.slf4j.LoggerFactory; - - ImportCom.quartz.job.MRJob; the - - Public classQuartzooziejobtest { - Public Static voidMain (string[] args)throwsexception{ +Quartzooziejobtest test =Newquartzooziejobtest (); - Test.run (); + } A at Public voidRun ()throwsexception{ -Logger log = Loggerfactory.getlogger (quartzooziejobtest.class); - -Log.info ("-------Initializing----------------------"); - -Schedulerfactory SF =Newstdschedulerfactory (); inScheduler sched =Sf.getscheduler (); - to LongStartTime = System.currenttimemillis () + 20000L; +Date Starttriggertime =NewDate (startTime); - theJobdetail Jobdetail = Newjob (mrjob.class). Withidentity ("Job", "group1"). Build (); *Simpletrigger trigger = (Simpletrigger) Newtrigger (). Withidentity ("Trigger", "group1"). StartAt (Starttriggertime). build (); $ Panax NotoginsengDate ft =sched.schedulejob (Jobdetail, trigger); - theLog.info (Jobdetail.getkey () + "would submit at" + ft + "only once."); + A Sched.start (); the //Sched.shutdown (true); + } -}
Source code download for the entire project
A simple use of quartz and Oozie scheduling jobs for big data computing platform execution