Source Code Analysis Elasticjob shard mechanism (with fragmentation mechanism flowchart)

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This paper will focus on analyzing the fragmentation mechanism of elasticjob:

Elasticjob Shard working mechanism:
1, Elasticjob at startup, first will start whether to need to re-shard the listener.
Code See: Listenermanager#startalllisteners {...; Shardinglistenermanager.start ();...}.
2, the task needs to obtain the Shard information before execution, if the need to re-shard, the primary server to execute the Shard algorithm, other from the server wait until the Shard is complete.
Code See: Abstractelasticjobexecutor#execute {...;
Jobfacade.getshardingcontexts ();..;
}

1, shard Management Listener detailed
The Elasticjob Event Listener Manager Implementation class is: Abstractlistenermanager.
The class diagram is:

Jobnodestorage jobnodestorage:job node Operation API.
Its core approach:
public abstract void Start (): Launches the Listener manager, which is specifically implemented by subclasses.
protected void Adddatalistener (Treecachelistener listener): Increases the event listener.
Elasticjob's Select Master Listener Manager, Shard Listener Manager, Failover Listener manager, and so on are all Abstractlistenermanager subclasses.
The Shard-related listener Manager Class diagram is shown in the figure:

Shardinglistenermanager: Shard Monitor Manager.
Shardingtotalcountchangedjoblistener: Monitor the total number of slices event manager, which is the Treecachelistener (curator event listener) subclass.
Listenserverschangedjoblistener: Event Listener after the number of task job servers (runtime instances) has changed.
1.1 Source Analysis Shardingtotalcountchangedjoblistener Listener

class Shardingtotalcountchangedjoblistener extends Abstractjoblistener {@Override protected void datachanged (final string path, Final Type EventType, final string data) {if (Confignode.isco Nfigpath (PATH) && 0! = jobregistry.getinstance (). Getcurrentshardingtotalcount (JobName)) {int NewS Hardingtotalcount = Litejobconfigurationgsonfactory.fromjson (data). Gettypeconfig (). Getcoreconfig ().
                Getshardingtotalcount (); if (Newshardingtotalcount! = Jobregistry.getinstance (). Getcurrentshardingtotalcount (JobName)) {Shardi
                    Ngservice.setreshardingflag ();
                Jobregistry.getinstance (). Setcurrentshardingtotalcount (JobName, Newshardingtotalcount); }
            }
        }
    }

The total number of Shard nodes in the job configuration is changed by the Listener (Elasticjob allows you to modify the total number of shards per task configuration through the Web interface). The configuration information for the
job is stored on the ${namespace}/jobname/config node and stores the configuration information in JSON format.
If the content of the ${namespace}/jobname/config node changes, ZK triggers node data change events for that node, if the number of shard nodes stored in ZK and the number of shards in memory (Jobregistry.getinstance () is not the same, calling the Shardingservice setting requires re-sharding tokens (creating ${namespace}/jobname/leader/sharding/necessary persistent nodes) and updating the total number of Shard nodes in memory.
1.2 Source Analysis Listenserverschangedjoblistener listener

class Listenserverschangedjoblistener extends Abstractjoblistener {@Override Pro tected void datachanged (final string path, Final Type EventType, final string data) {if (! Jobregistry.getinstance (). IsShutDown (JobName) && (Isinstancechange (eventtype, path) | | isserverchange (PATH)
            )) {Shardingservice.setreshardingflag (); }} Private Boolean Isinstancechange (Final Type eventtype, final String path) {return Instanc
        Enode.isinstancepath (path) && type.node_updated! = EventType;
        Private Boolean Isserverchange (final String path) {return Servernode.isserverpath (path); }
    }

The

Shard node (number of instances) changes the event listener, which needs to be re-partitioned when a new shard node joins or the original Shard instance goes down.
If the number of nodes under the ${namespace}/jobname/servers or ${namespace}/jobname/instances path changes, the setting needs to re-shard the identity if a change is detected.
2, the specific shard logic
above detailed analysis of the Shard Monitoring manager, its responsibility is to listen to a specific ZK directory, when the change to determine whether you need to set the re-shard Mark, if the need to re-shard tag, when the re-shard is triggered.
before the execution of each scheduled task, the first need to obtain the Shard information (The Shard context), and then according to the Shard information from the server pull different data, the task processing, its source code entry is:
Com.dangdang.ddframe.job.executor.abstractelasticjobexecutor#execute the
Jobfacade.getshardingcontexts () method. The

Concrete Implementation Method code is:
litejobfacade#getshardingcontexts

public shardingcontexts getshardingcontexts () {Boolean isfailover = Configservice.load (t     Rue). Isfailover (); @1 if (isfailover) {list<integer> Failovershardingitems = Failoverservice.getlocalfailoverit
            EMS (); if (!failovershardingitems.isempty ()) {return Executioncontextservice.getjobshardingcontext (FailoverShard
            Ingitems);   }} shardingservice.shardingifnecessary (); @2 list<integer> shardingitems = Shardingservice.getlocalshardingitems ();
        @3 if (isfailover) {Shardingitems.removeall (Failoverservice.getlocaltakeoffitems ());  } shardingitems.removeall (Executionservice.getdisableditems (Shardingitems));  @4 return Executioncontextservice.getjobshardingcontext (Shardingitems); @5}

Code @1: whether to initiate failover, this article focuses on the elasticjob fragmentation mechanism, failover is described in detail in the next article, and this article assumes that failover is not turned on.
Code @2: If necessary, execute the Shard, if there is no shard information (first shard) or need to re-shard, then execute the Shard algorithm, then detailed analysis of the implementation of the Shard logic.
Code @3: Gets the local shard information. Traverse all shard information ${namespace}/jobname/sharding/{Shard Item} under all instance nodes to determine if its value jobinstanceid is equal to the current Jobinstanceid, Equality is considered to be the Shard information of this node.
Code @4: Remove locally disabled shards, locally disable the storage directory for shards ${namespace}/jobname
/sharding/{shard item}/disable.
Code @5: Returns the Shard context of the current node, which is primarily based on the configuration information (shard parameters) and the current Shard instance, building the Shardingcontexts object.
2.1 shardingservice.shardingifnecessary detailed "Shard Logic"

/** * If a shard is required and the current node is the primary node, the job is fragmented.
     * * <p> * If no nodes are currently available, do not shard. * </p> * * public void Shardingifnecessary () {list<jobinstance> availablejobinstances = Insta Nceservice.getavailablejobinstances ();
        @1 if (!isneedsharding () | | Availablejobinstances.isempty ()) {//@2 return;           } if (!leaderservice.isleaderuntilblock ()) {//@3 blockuntilshardingcompleted ();
        @4 return;                  } waitingotherjobcompleted ();
        @5 litejobconfiguration litejobconfig = Configservice.load (false);  int shardingtotalcount = Litejobconfig.gettypeconfig (). Getcoreconfig (). Getshardingtotalcount ();
        @5 log.debug ("Job ' {} ' sharding begin.", JobName);     Jobnodestorage.fillephemeraljobnode (Shardingnode.processing, "");  @6 Resetshardinginfo (Shardingtotalcount); @7 jobshardingstrategy jobshardingstrategy = JobshardiNgstrategyfactory.getstrategy (Litejobconfig.getjobshardingstrategyclass ()); @8 jobnodestorage.executeintransaction (New Persistshardinginfotransactionexecutioncallback (jobShardingStrategy   . sharding (Availablejobinstances, JobName, Shardingtotalcount));
    @9 log.debug ("Job ' {} ' sharding complete.", jobName); }

Code @1: Gets the currently available instance, first gets all the child nodes in the \ ${namespace}/jobname/instances directory, and determines if the IP server of the instance node is available, \${namespace}/jobname/servers The value that is stored by the/IP node is considered to be available if it is not disable.
Code @2: If you do not need to re-shard (\${namespace}/jobname/leader/sharding
/necessary node does not exist) or no available instance currently exists, it is returned.
Code @3, determine whether the primary node, if the main node is currently in the election, block until the selection of the main completion, blocking the code used here is as follows:

while (!hasleader () && serverservice.hasavailableservers ()) {   //If there is no primary node that has an available instance, then Thread.Sleep (), Triggers a primary selection.
            Log.info ("Leader is electing, waiting for {} MS", +);
            Blockutils.waitingshorttime ();
            if (! Jobregistry.getinstance (). IsShutDown (jobName) &&    
                     serverservice.isavailableserver ( Jobregistry.getinstance (). Getjobinstance (JobName). GetIP ()) {
                electleader ();
            }
}
return Isleader ();

Code @4: If the current node is not the primary node, wait for the Shard to end. Whether the Shard ends is based on the presence or ${namespace}/jobname/leader/sharding/of the ${namespace}/jobname/leader/sharding/necessary node The processing node exists (indicates that a shard operation is being performed), and if the Shard is not closed, use the Thread.Sleep method to block 100 millimeters and try again.
Code @5: Can enter here, indicating that the node is the primary node. The primary node waits for the batch to complete before executing the Shard, and to determine whether any other task is running is to determine if there is a ${namespace}/jobname/sharding/{shard item}/running, if present, Then use Thread.Sleep (100) and then judge.
Code @6: Creates a temporary node ${namespace}/jobname/leader/sharding/processing node, indicating that the Shard is executing.
Code @7: Resets shard information. Delete the ${namespace}/jobname/sharding/{shard item}/instance node First, and then create the ${namespace}/jobname/sharding/{Shard Item} node (if necessary). Then, depending on the total number of shards currently configured, if the current number of ${namespace}/jobname/sharding child nodes is greater than the number of configured shard nodes, the redundant nodes (from large to small) are removed.
Code @8: Gets the configured Shard algorithm class, the commonly used shard algorithm is the average shard algorithm (averageallocationjobshardingstrategy).
Code @9: creates the corresponding Shard instance information within a transaction ${namespace}/jobname/{the Shard Item}/instance, the node holds the ID of the Jobinstance instance.
Performing transactional operations in ZK: Jobnodestorage#executeintransaction

/**
     * Operations are performed in the transaction.
     * 
     * @param callback Perform action callback *
    /public void executeintransaction (final transactionexecutioncallback callback) {
        try {
            curatortransactionfinal curatortransactionfinal = Getclient (). Intransaction (). Check (). Forpath ("/"). and ();  @1
            Callback.execute (curatortransactionfinal);   @2
            curatortransactionfinal.commit ();                 @3
        //checkstyle:off
        } catch (Final Exception ex) {
        //checkstyle:on
            Regexceptionhandler.handleexception (ex);
        }
    }

Code @1, using the Curatorframeworkfactory intransaction () method, Cascade calls Check (), and finally returns the Curatortransactionfinal instance through the and () method, All update node commands in the transaction are executed by the instance. Then execute the commit () command to commit uniformly (the method can guarantee that either all succeeds or all fails).
Code @2, executes the specific logic through the callback Persistshardinginfotransactionexecutioncallback method.
Code @3, committing the transaction.
Code See Shardingservice$persistshardinginfotransactionexecutioncallback

class Persistshardinginfotransactionexecutioncallback implements
        Transactionexecutioncallback {Private final map<jobinstance, list<integer>> shardingresults;
            @Override public void Execute (final curatortransactionfinal curatortransactionfinal) throws Exception { For (Map.entry<jobinstance, list<integer>> entry:shardingResults.entrySet ()) {for (int sh ArdingItem:entry.getValue ()) {curatortransactionfinal.create (). Forpath (Jobnodepath.getfullpath (Shar   Dingnode.getinstancenode (Shardingitem)), Entry.getkey (). Getjobinstanceid (). GetBytes ()). and (); @1}} curatortransactionfinal.delete (). Forpath (Jobnodepath.getfullpath (Shardi   ngnode.necessary)). and ();  @2 curatortransactionfinal.delete (). Forpath (Jobnodepath.getfullpath (shardingnode.processing)). and (); @3}}

Code @1: The so-called Shard, the main is to create ${namespace}/jobname/sharding/{shard Item}/instance, the node content is the Jobinstance ID.
Code @2: Delete the ${namespace}/jobname/leader/sharding/necessary node.
Code @3: Deletes the ${namespace}/jobname/leader/sharding/processing node, indicating the end of the Shard. The
follows a shard flowchart to end the narration in this section:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Source Code Analysis Elasticjob shard mechanism (with fragmentation mechanism flowchart)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Source Code Analysis Elasticjob shard mechanism (with fragmentation mechanism flowchart)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support