A simple use of quartz and Oozie scheduling jobs for big data computing platform execution

Last Update:2015-11-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, introduce

Oozie is a Hadoop-based workflow Scheduler that can submit different types of jobs programmatically through the Oozie Client, such as mapreduce jobs and spark jobs to the underlying computing platform, such as Cloudera Hadoop.

Quartz is an open-source scheduling software that provides a variety of triggers and listeners for scheduling execution of tasks

The following uses Quartz + Oozie to submit a MapReduce program to Cloudera Hadoop execution

Second, the idea of dispatching

Why should ① use quartz? Mainly with the help of quartz powerful trigger function. It allows different scheduling requirements to be met, such as how many times a job is executed once a week and the job is repeatedly executed. Here's an important question: Suppose I have a job that needs to be repeated, and when the job is submitted to CDH for the first time, it is no longer time to upload the job to CDH and execute it, but to record the submitted job, and the next time it needs to run, Just let CDH run the job again.

② the advantage of using quartz is that you can do some control when the job is submitted. For example, if a job of a certain type is submitted at a very high frequency, or if the run time is short (judging by what it did last time), then the next time you run it, let it have a higher priority.

The purpose of ③ using Oozie is to let it send jobs to the underlying computing platform, such as CDH, to execute the job.

Third, the Eclipse development environment is built

It is primarily a dependency package that requires quartz and oozie. Specific as follows:

Four, realize the idea

A) The scheduling system currently only considers scheduling two types of jobs: MapReduce jobs and spark jobs. First of these two kinds of jobs passed quartz to Oozie, and then let Oozie to submit to CDH computing platform to execute.

b) Quartz provides a common job interface. There is only one execute () method, which is responsible for accomplishing the specific functions of the quartz scheduled job: To pass the job to Oozie

c) define an abstract class Basejob, which defines two methods. These two methods are mainly used to do some preparatory work, that is, when you pass the job to Oozie using quartz, you need to find the directory where the job is stored in HDFs and copy it to the execution directory.

D) Finally, there are two specific implementation classes, Mrjob and Sparkjob, which represent the mapreduce job and the spark job, respectively. Complete the configuration of the job in the implementation class, and then submit the job to the CDH compute platform for execution.

The related class diagram is as follows:

Five, specific code analysis

Mrjob.java

Execute () of the Org.quartz.Job interface is implemented, and the method is automatically executed by Quartz Schedule when the trigger is triggered. This allows you to define triggers as needed to control when jobs are submitted to Oozie.

1 @Override2      Public voidExecute (jobexecutioncontext arg0)throwsjobexecutionexception {3         Try{4String jobId =wc.run (conf);5System.out.println ("Workflow Job submitted");//submit job to Oozie and get the jobId6             7             //wait until the workflow job finishes8              while(Wc.getjobinfo (jobId). GetStatus () = =status.running) {9System.out.println ("Workflow Job running ...");Ten                 Try{ OneThread.Sleep (10*1000); A}Catch(interruptedexception e) {e.printstacktrace ();} -             } -System.out.println ("Workflow Job completed!"); the System.out.println (Wc.getjobid (jobId)); -}Catch(oozieclientexception e) {e.printstacktrace ();} -}

The main function of the test is as follows: You can see that for a client, you can debug a mapreduce job just by writing a regular quartz job. To run the program, of course, you have to be prepared in advance to the operating environment of the job. Specific reference

1 Import StaticOrg.quartz.JobBuilder.newJob;2 Import StaticOrg.quartz.TriggerBuilder.newTrigger;3 4 Importjava.util.Date;5 6 ImportOrg.quartz.JobDetail;7 ImportOrg.quartz.Scheduler;8 Importorg.quartz.SchedulerFactory;9 ImportOrg.quartz.SimpleTrigger;Ten Importorg.quartz.impl.StdSchedulerFactory; One ImportOrg.slf4j.Logger; A Importorg.slf4j.LoggerFactory; -  - ImportCom.quartz.job.MRJob; the  -  -  Public classQuartzooziejobtest { -      Public Static voidMain (string[] args)throwsexception{ +Quartzooziejobtest test =Newquartzooziejobtest (); - Test.run (); +     } A      at      Public voidRun ()throwsexception{ -Logger log = Loggerfactory.getlogger (quartzooziejobtest.class); -  -Log.info ("-------Initializing----------------------"); -  -Schedulerfactory SF =Newstdschedulerfactory (); inScheduler sched =Sf.getscheduler (); -          to         LongStartTime = System.currenttimemillis () + 20000L; +Date Starttriggertime =NewDate (startTime); -          theJobdetail Jobdetail = Newjob (mrjob.class). Withidentity ("Job", "group1"). Build (); *Simpletrigger trigger = (Simpletrigger) Newtrigger (). Withidentity ("Trigger", "group1"). StartAt (Starttriggertime). build (); $         Panax NotoginsengDate ft =sched.schedulejob (Jobdetail, trigger); -          theLog.info (Jobdetail.getkey () + "would submit at" + ft + "only once."); +          A Sched.start (); the //Sched.shutdown (true); +     } -}

Source code download for the entire project

A simple use of quartz and Oozie scheduling jobs for big data computing platform execution

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

A simple use of quartz and Oozie scheduling jobs for big data computing platform execution

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

A simple use of quartz and Oozie scheduling jobs for big data computing platform execution

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support