Elastic-job Development Guide

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original address: http://dangdangdotcom.github.io/elastic-job/post/1.x/user_guide/

Development Guide Code Development Job Type

Currently, there are 3 types of jobs available, namely simple, dataflow and script.

The dataflow type is used to process data streams, and it provides 2 job types, namely Throughputdataflow and Sequencedataflow. The corresponding abstract class needs to be inherited.

The script type is used for processing scripts and can be used directly without coding.

The method parameter Shardingcontext contains job configuration, Shard, and run-time information. The total number of shards can be obtained by Getshardingtotalcount (), Getshardingitems () and other methods, and the Shard sequence number of the job server is run. Simple Type Job

Simple type operation means simply implementation, without any type of encapsulation. You need to inherit Abstractsimpleelasticjob, which provides only one method for overwriting, and this method will be executed on a timed basis. Used to perform normal timing tasks, similar to the Quartz native interface, except for the addition of elastic scaling capacity and sharding functions.

public class Myelasticjob extends Abstractsimpleelasticjob {

    @Override public
    void Process ( Jobexecutionmultipleshardingcontext context) {
        //do something by sharding items
    }
}

throughputdataflow Type Job

The Throughputdataflow type job means a high throughput data flow job. You need to inherit abstractindividualthroughputdataflowelasticjob and you can specify a return value generic, which provides 3 methods that can be overridden to fetch data, manipulate data, and specify whether to stream data. You can obtain secondary monitoring information such as the number of successful data processing failures. If the return value of the Fetchdata method is only null or empty when the data is streamed, the job will stop executing or the job will continue to run, and the non-streaming data will only execute one Fetchdata method and ProcessData method during each job execution. The job is completed. Flow data processing reference tbschedule design, suitable for the non-intermittent data processing.

When the job executes, the Fetchdata data is passed to ProcessData processing, where the data ProcessData is split by multithreading (the thread pool size is available). In the case of streaming job processing, it is recommended that processdata process the data and update its state to avoid fetchdata crawling again, so that the job never stops. The return value of the processdata is used to indicate whether the data is successfully processed, throws an exception, or returns false to the count of failures in the statistics, and returns true to the number of successes.

public class Myelasticjob extends abstractindividualthroughputdataflowelasticjob<foo> {

    @Override
    Public list<foo> Fetchdata (Jobexecutionmultipleshardingcontext context) {
        Map<integer, string> offset = Context.getoffsets ();
        list<foo> result =//Get data from the database by sharding items and by offset
        return result
    ;

    @Override Public
    Boolean processdata (jobexecutionmultipleshardingcontext context, Foo data) {
        //process data<

        c9/>//... Store offset
        for (Int. each:context.getShardingItems ()) {
            Updateoffset (each, "your offset, maybe id"); 
  
   }
        return true;
    }
}

sequencedataflow Type Job

The Sequencedataflow type job is very similar to the Throughputdataflow job type, except that the Throughputdataflow job type can multithreading the acquired data, but does not guarantee the order of multithreading data. such as: from 2 shards to obtain to 100 data, 1th Shard 40, 2nd Shard 60, configured as two thread processing, the 1th thread processing the first 50 data, 2nd thread processing after 50 data, ignoring the Shard item The Sequencedataflow type job is multithreaded based on the number of Shard items allocated by the current server, and each shard item uses the same thread processing to prevent the sequential problem caused by multithreading of the data of the same shard. such as: from 2 shards to get to 100 data, 1th Shard 40, 2nd Shard 60, the system automatically assigned two threads processing, 1th thread processing 1th shard 40 data, 2nd thread processing 2nd Shard 60 data. Because the Throughputdataflow job can handle any number of threads with more than the Shard item, performance tuning may be better than the Sequencedataflow job.

public class Myelasticjob extends abstractindividualsequencedataflowelasticjob<foo> {

    @Override
    Public list<foo> Fetchdata (Jobexecutionsingleshardingcontext context) {
        int offset = Context.getoffset ();
        list<foo> result =//Get data from the database by sharding items and by offset
        return result
    ;

    @Override Public
    Boolean processdata (jobexecutionsingleshardingcontext context, Foo data) {
        //process data
        // ...

        Store offset
        updateoffset (Context.getshardingitem (), "your offset, maybe id");
        return true;
    }
}

script Type Job

The script type job is a scripting type job that supports all types of scripts such as Shell,python,perl. Simply configure Scriptcommandline via the console/code. The execution script path can contain parameters, and the last parameter is the job run-time information.

#!/bin/bash
echo sharding execution context is $*

Job run-time output

Sharding execution context is {"Shardingitems": [0,1,2,3,4,5,6,7,8,9], "Shardingitemparameters": {}, "offsets": {}, " JobName ":" Scriptelasticdemojob "," Shardingtotalcount ": Ten," Jobparameter ":" "," monitorexecution ": true," Fetchdatacount ": 1} Batch processing

In order to improve data processing efficiency, data flow type job provides the function of batch processing data. The previous two abstract classes of data processing were abstractindividualthroughputdataflowelasticjob and abstractindividualsequencedataflowelasticjob, respectively, Batch processing uses the other two interfaces Abstractbatchthroughputdataflowelasticjob and Abstractbatchsequencedataflowelasticjob. The difference is that the return value of the ProcessData method changes from a Boolean type to an int type, which represents the number of successful batches of data processing, and the second entry is transformed into a list data collection. Exception Handling

Elastic-job provides the Handlejobexecutionexception method at the top-level interface, which can be overridden by using a job, and uses quartz-provided jobexecutionexception to control the declaration period of the job after an exception. The default implementation is to throw the exception directly. Example: Task Listener Configuration

You can configure multiple task listeners to perform a listening method before and after a task is executed. The listener is divided into each job node execution and only one node in the distributed scenario executes two. monitoring performed by each job node

If the job handles the files of the job server and deletes the files after processing is complete, consider using each node to perform cleanup tasks. This type of task is simple to implement, and you should use this type of listener whenever possible, regardless of whether the global distributed task is complete.

Step: Define the Listener

Import Com.dangdang.ddframe.job.api.JobExecutionMultipleShardingContext;
Import Com.dangdang.ddframe.job.api.listener.ElasticJobListener;

public class Myelasticjoblistener implements Elasticjoblistener {

    @Override public
    void beforejobexecuted ( Final Jobexecutionmultipleshardingcontext shardingcontext) {
        //do something ...
    }

    @Override public
    void afterjobexecuted (final jobexecutionmultipleshardingcontext shardingcontext) {
        //Do Something ...
    }
}

Passing the listener as a parameter into the Jobscheduler

public class Jobmain {public

    static void Main (final string[] args) {
        new Jobscheduler (Regcenter, Jobconfig, new Myelasticjoblistener ()). Init ();
    }
}

monitoring of only single node execution in distributed scenarios

If the job processes database data, only one node can complete the data Cleanup task after processing. This type of task is complex to handle, synchronizing the state synchronization of jobs in a distributed environment, and providing a timeout setting to avoid deadlocks caused by job synchronization, use caution.

Step: Define the Listener

Import Com.dangdang.ddframe.job.api.JobExecutionMultipleShardingContext;
Import Com.dangdang.ddframe.job.api.listener.AbstractDistributeOnceElasticJobListener;

Public final class Testdistributeonceelasticjoblistener extends Abstractdistributeonceelasticjoblistener {

    Public Testdistributeonceelasticjoblistener (final long starttimeoutmills, final long completetimeoutmills) {
        Super (Starttimeoutmills, completetimeoutmills);
    }

    @Override public
    void dobeforejobexecutedatlaststarted (Final jobexecutionmultipleshardingcontext Shardingcontext) {
        //do something ...
    }

    @Override public
    void doafterjobexecutedatlastcompleted (Final jobexecutionmultipleshardingcontext Shardingcontext) {
        //do something ...
    }
}

Passing the listener as a parameter into the Jobscheduler

public class Jobmain {public

    static void Main (final string[] args) {
        long starttimeoutmills = 5000L;
        Long completetimeoutmills = 10000L;
        New Jobscheduler (Regcenter, Jobconfig, New Mydistributeonceelasticjoblistener (Starttimeoutmills, completetimeoutmills). Init ();}
}

Job Configuration

Working with the spring container, the job Bean can be configured as a spring Bean, and objects such as data sources managed by the spring container can be used in the job through dependency injection. You can use the placeholder placeholder dependency file to take a value. Spring Namespace Configuration

<?xml version= "1.0" encoding= "UTF-8"?> <beans xmlns= "Http://www.springframework.org/schema/beans" xmlns: Xsi= "Http://www.w3.org/2001/XMLSchema-instance" xmlns:reg= "Http://www.dangdang.com/schema/ddframe/reg" Xmlns:job
                        = "Http://www.dangdang.com/schema/ddframe/job" xsi:schemalocation= "Http://www.springframework.org/schema/beans Http://www.springframework.org/schema/beans/spring-beans.xsd http://www.dangdang.c Om/schema/ddframe/reg http://www.dangdang.com/schema/ddframe/reg/reg.xsd H
                        Ttp://www.dangdang.com/schema/ddframe/job http://www.dangdang.com/schema/ddframe/job/job.xsd > <!--Configure job Registration center-<reg:zookeeper id= "Regcenter" server-lists= "yourhost:2181" nam Espace= "Dd-job" base-sleep-time-milliseconds= "max-sleep-time-milliseconds=" max-retries= "3"/> <!- -Configure simple jobs--<job:simple Id= "simpleelasticjob" class= "xxx.   Mysimpleelasticjob "registry-center-ref=" Regcenter "cron=" 0/10 * * * *? " Sharding-total-count= "3" sharding-item-parameters= "0=a,1=b,2=c"/> <!--configuring Data Flow Jobs--<job:dataflow id= " Throughputdataflow "class=" xxx. Mythroughputdataflowelasticjob "registry-center-ref=" Regcenter "cron=" 0/10 * * * *? "sharding-total-count=" 3 " sharding-item-parameters= "0=a,1=b,2=c" process-count-interval-seconds= "ten" concurrent-data-process-thread-count=  "Ten"/> <!--configuration Script job--<job:script id= "Scriptelasticjob" registry-center-ref= "Regcenter" cron= "0/10 * * * * *? "sharding-total-count=" 3 "sharding-item-parameters=" 0=a,1=b,2=c "script-command-line="/your/file/path/ Demo.sh "/> <!--Configure simple jobs with monitoring--<job:simple id=" listenerelasticjob "class=" xxx.   Mysimplelistenerelasticjob "registry-center-ref=" Regcenter "cron=" 0/10 * * * *? " Sharding-total-count= "3" sharding-item-parameters= "0=a,1=b,2=c" > <job:listener class= "XX." Mysimplejoblistener "/> <job:listener class=" xx. Myoncesimplejoblistener "started-timeout-milliseconds=" "completed-timeout-milliseconds=" "/> </job: Simple> </beans>

Job:simple Namespace Properties Detailed description

Property name	type	is required	Description
Id	String	Is	Job name
Class	String	Whether	Job implementation class, need to implement Elasticjob interface, script-based jobs do not need to be configured
Registry-center-ref	String	Is	A reference to the registry bean, which needs to refer to the Reg:zookeeper declaration
Cron	String

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Elastic-job Development Guide

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Elastic-job Development Guide

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support