A simple use of quartz and Oozie scheduling jobs for big data computing platform execution

Source: Internet
Author: User

First, introduce

Oozie is a Hadoop-based workflow Scheduler that can submit different types of jobs programmatically through the Oozie Client, such as mapreduce jobs and spark jobs to the underlying computing platform, such as Cloudera Hadoop.

Quartz is an open-source scheduling software that provides a variety of triggers and listeners for scheduling execution of tasks

The following uses Quartz + Oozie to submit a MapReduce program to Cloudera Hadoop execution

Second, the idea of dispatching

Why should ① use quartz? Mainly with the help of quartz powerful trigger function. It allows different scheduling requirements to be met, such as how many times a job is executed once a week and the job is repeatedly executed. Here's an important question: Suppose I have a job that needs to be repeated, and when the job is submitted to CDH for the first time, it is no longer time to upload the job to CDH and execute it, but to record the submitted job, and the next time it needs to run, Just let CDH run the job again.

② the advantage of using quartz is that you can do some control when the job is submitted. For example, if a job of a certain type is submitted at a very high frequency, or if the run time is short (judging by what it did last time), then the next time you run it, let it have a higher priority.

The purpose of ③ using Oozie is to let it send jobs to the underlying computing platform, such as CDH, to execute the job.

Third, the Eclipse development environment is built

It is primarily a dependency package that requires quartz and oozie. Specific as follows:

Four, realize the idea

A) The scheduling system currently only considers scheduling two types of jobs: MapReduce jobs and spark jobs. First of these two kinds of jobs passed quartz to Oozie, and then let Oozie to submit to CDH computing platform to execute.

b) Quartz provides a common job interface. There is only one execute () method, which is responsible for accomplishing the specific functions of the quartz scheduled job: To pass the job to Oozie

c) define an abstract class Basejob, which defines two methods. These two methods are mainly used to do some preparatory work, that is, when you pass the job to Oozie using quartz, you need to find the directory where the job is stored in HDFs and copy it to the execution directory.

D) Finally, there are two specific implementation classes, Mrjob and Sparkjob, which represent the mapreduce job and the spark job, respectively. Complete the configuration of the job in the implementation class, and then submit the job to the CDH compute platform for execution.

The related class diagram is as follows:

Five, specific code analysis

Mrjob.java

Execute () of the Org.quartz.Job interface is implemented, and the method is automatically executed by Quartz Schedule when the trigger is triggered. This allows you to define triggers as needed to control when jobs are submitted to Oozie.

1 @Override2      Public voidExecute (jobexecutioncontext arg0)throwsjobexecutionexception {3         Try{4String jobId =wc.run (conf);5System.out.println ("Workflow Job submitted");//submit job to Oozie and get the jobId6             7             //wait until the workflow job finishes8              while(Wc.getjobinfo (jobId). GetStatus () = =status.running) {9System.out.println ("Workflow Job running ...");Ten                 Try{ OneThread.Sleep (10*1000); A}Catch(interruptedexception e) {e.printstacktrace ();} -             } -System.out.println ("Workflow Job completed!"); the System.out.println (Wc.getjobid (jobId)); -}Catch(oozieclientexception e) {e.printstacktrace ();} -}

The main function of the test is as follows: You can see that for a client, you can debug a mapreduce job just by writing a regular quartz job. To run the program, of course, you have to be prepared in advance to the operating environment of the job. Specific reference

1 Import StaticOrg.quartz.JobBuilder.newJob;2 Import StaticOrg.quartz.TriggerBuilder.newTrigger;3 4 Importjava.util.Date;5 6 ImportOrg.quartz.JobDetail;7 ImportOrg.quartz.Scheduler;8 Importorg.quartz.SchedulerFactory;9 ImportOrg.quartz.SimpleTrigger;Ten Importorg.quartz.impl.StdSchedulerFactory; One ImportOrg.slf4j.Logger; A Importorg.slf4j.LoggerFactory; -  - ImportCom.quartz.job.MRJob; the  -  -  Public classQuartzooziejobtest { -      Public Static voidMain (string[] args)throwsexception{ +Quartzooziejobtest test =Newquartzooziejobtest (); - Test.run (); +     } A      at      Public voidRun ()throwsexception{ -Logger log = Loggerfactory.getlogger (quartzooziejobtest.class); -  -Log.info ("-------Initializing----------------------"); -  -Schedulerfactory SF =Newstdschedulerfactory (); inScheduler sched =Sf.getscheduler (); -          to         LongStartTime = System.currenttimemillis () + 20000L; +Date Starttriggertime =NewDate (startTime); -          theJobdetail Jobdetail = Newjob (mrjob.class). Withidentity ("Job", "group1"). Build (); *Simpletrigger trigger = (Simpletrigger) Newtrigger (). Withidentity ("Trigger", "group1"). StartAt (Starttriggertime). build (); $         Panax NotoginsengDate ft =sched.schedulejob (Jobdetail, trigger); -          theLog.info (Jobdetail.getkey () + "would submit at" + ft + "only once."); +          A Sched.start (); the //Sched.shutdown (true); +     } -}

Source code download for the entire project

A simple use of quartz and Oozie scheduling jobs for big data computing platform execution

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.