This paper will focus on analyzing the fragmentation mechanism of elasticjob:
Elasticjob Shard working mechanism:
1, Elasticjob at startup, first will start whether to need to re-shard the listener.
Code See: Listenermanager#startalllisteners {...; Shardinglistenermanager.start ();...}.
2, the task needs to obtain the Shard information before execution, if the need to re-shard, the primary server to execute the Shard algorithm, other from the server wait until the Shard is complete.
Code See: Abstractelasticjobexecutor#execute {...;
Jobfacade.getshardingcontexts ();..;
}
1, shard Management Listener detailed
The Elasticjob Event Listener Manager Implementation class is: Abstractlistenermanager.
The class diagram is:
Jobnodestorage jobnodestorage:job node Operation API.
Its core approach:
public abstract void Start (): Launches the Listener manager, which is specifically implemented by subclasses.
protected void Adddatalistener (Treecachelistener listener): Increases the event listener.
Elasticjob's Select Master Listener Manager, Shard Listener Manager, Failover Listener manager, and so on are all Abstractlistenermanager subclasses.
The Shard-related listener Manager Class diagram is shown in the figure:
Shardinglistenermanager: Shard Monitor Manager.
Shardingtotalcountchangedjoblistener: Monitor the total number of slices event manager, which is the Treecachelistener (curator event listener) subclass.
Listenserverschangedjoblistener: Event Listener after the number of task job servers (runtime instances) has changed.
1.1 Source Analysis Shardingtotalcountchangedjoblistener Listener
class Shardingtotalcountchangedjoblistener extends Abstractjoblistener {@Override protected void datachanged (final string path, Final Type EventType, final string data) {if (Confignode.isco Nfigpath (PATH) && 0! = jobregistry.getinstance (). Getcurrentshardingtotalcount (JobName)) {int NewS Hardingtotalcount = Litejobconfigurationgsonfactory.fromjson (data). Gettypeconfig (). Getcoreconfig ().
Getshardingtotalcount (); if (Newshardingtotalcount! = Jobregistry.getinstance (). Getcurrentshardingtotalcount (JobName)) {Shardi
Ngservice.setreshardingflag ();
Jobregistry.getinstance (). Setcurrentshardingtotalcount (JobName, Newshardingtotalcount); }
}
}
}
The total number of Shard nodes in the job configuration is changed by the Listener (Elasticjob allows you to modify the total number of shards per task configuration through the Web interface). The configuration information for the
job is stored on the ${namespace}/jobname/config node and stores the configuration information in JSON format.
If the content of the ${namespace}/jobname/config node changes, ZK triggers node data change events for that node, if the number of shard nodes stored in ZK and the number of shards in memory (Jobregistry.getinstance () is not the same, calling the Shardingservice setting requires re-sharding tokens (creating ${namespace}/jobname/leader/sharding/necessary persistent nodes) and updating the total number of Shard nodes in memory.
1.2 Source Analysis Listenserverschangedjoblistener listener
class Listenserverschangedjoblistener extends Abstractjoblistener {@Override Pro tected void datachanged (final string path, Final Type EventType, final string data) {if (! Jobregistry.getinstance (). IsShutDown (JobName) && (Isinstancechange (eventtype, path) | | isserverchange (PATH)
)) {Shardingservice.setreshardingflag (); }} Private Boolean Isinstancechange (Final Type eventtype, final String path) {return Instanc
Enode.isinstancepath (path) && type.node_updated! = EventType;
Private Boolean Isserverchange (final String path) {return Servernode.isserverpath (path); }
}
The
Shard node (number of instances) changes the event listener, which needs to be re-partitioned when a new shard node joins or the original Shard instance goes down.
If the number of nodes under the ${namespace}/jobname/servers or ${namespace}/jobname/instances path changes, the setting needs to re-shard the identity if a change is detected.
2, the specific shard logic
above detailed analysis of the Shard Monitoring manager, its responsibility is to listen to a specific ZK directory, when the change to determine whether you need to set the re-shard Mark, if the need to re-shard tag, when the re-shard is triggered.
before the execution of each scheduled task, the first need to obtain the Shard information (The Shard context), and then according to the Shard information from the server pull different data, the task processing, its source code entry is:
Com.dangdang.ddframe.job.executor.abstractelasticjobexecutor#execute the
Jobfacade.getshardingcontexts () method. The
Concrete Implementation Method code is:
litejobfacade#getshardingcontexts
public shardingcontexts getshardingcontexts () {Boolean isfailover = Configservice.load (t Rue). Isfailover (); @1 if (isfailover) {list<integer> Failovershardingitems = Failoverservice.getlocalfailoverit
EMS (); if (!failovershardingitems.isempty ()) {return Executioncontextservice.getjobshardingcontext (FailoverShard
Ingitems); }} shardingservice.shardingifnecessary (); @2 list<integer> shardingitems = Shardingservice.getlocalshardingitems ();
@3 if (isfailover) {Shardingitems.removeall (Failoverservice.getlocaltakeoffitems ()); } shardingitems.removeall (Executionservice.getdisableditems (Shardingitems)); @4 return Executioncontextservice.getjobshardingcontext (Shardingitems); @5}
Code @1: whether to initiate failover, this article focuses on the elasticjob fragmentation mechanism, failover is described in detail in the next article, and this article assumes that failover is not turned on.
Code @2: If necessary, execute the Shard, if there is no shard information (first shard) or need to re-shard, then execute the Shard algorithm, then detailed analysis of the implementation of the Shard logic.
Code @3: Gets the local shard information. Traverse all shard information ${namespace}/jobname/sharding/{Shard Item} under all instance nodes to determine if its value jobinstanceid is equal to the current Jobinstanceid, Equality is considered to be the Shard information of this node.
Code @4: Remove locally disabled shards, locally disable the storage directory for shards ${namespace}/jobname
/sharding/{shard item}/disable.
Code @5: Returns the Shard context of the current node, which is primarily based on the configuration information (shard parameters) and the current Shard instance, building the Shardingcontexts object.
2.1 shardingservice.shardingifnecessary detailed "Shard Logic"
/** * If a shard is required and the current node is the primary node, the job is fragmented.
* * <p> * If no nodes are currently available, do not shard. * </p> * * public void Shardingifnecessary () {list<jobinstance> availablejobinstances = Insta Nceservice.getavailablejobinstances ();
@1 if (!isneedsharding () | | Availablejobinstances.isempty ()) {//@2 return; } if (!leaderservice.isleaderuntilblock ()) {//@3 blockuntilshardingcompleted ();
@4 return; } waitingotherjobcompleted ();
@5 litejobconfiguration litejobconfig = Configservice.load (false); int shardingtotalcount = Litejobconfig.gettypeconfig (). Getcoreconfig (). Getshardingtotalcount ();
@5 log.debug ("Job ' {} ' sharding begin.", JobName); Jobnodestorage.fillephemeraljobnode (Shardingnode.processing, ""); @6 Resetshardinginfo (Shardingtotalcount); @7 jobshardingstrategy jobshardingstrategy = JobshardiNgstrategyfactory.getstrategy (Litejobconfig.getjobshardingstrategyclass ()); @8 jobnodestorage.executeintransaction (New Persistshardinginfotransactionexecutioncallback (jobShardingStrategy . sharding (Availablejobinstances, JobName, Shardingtotalcount));
@9 log.debug ("Job ' {} ' sharding complete.", jobName); }
Code @1: Gets the currently available instance, first gets all the child nodes in the \ ${namespace}/jobname/instances directory, and determines if the IP server of the instance node is available, \${namespace}/jobname/servers The value that is stored by the/IP node is considered to be available if it is not disable.
Code @2: If you do not need to re-shard (\${namespace}/jobname/leader/sharding
/necessary node does not exist) or no available instance currently exists, it is returned.
Code @3, determine whether the primary node, if the main node is currently in the election, block until the selection of the main completion, blocking the code used here is as follows:
while (!hasleader () && serverservice.hasavailableservers ()) { //If there is no primary node that has an available instance, then Thread.Sleep (), Triggers a primary selection.
Log.info ("Leader is electing, waiting for {} MS", +);
Blockutils.waitingshorttime ();
if (! Jobregistry.getinstance (). IsShutDown (jobName) &&
serverservice.isavailableserver ( Jobregistry.getinstance (). Getjobinstance (JobName). GetIP ()) {
electleader ();
}
}
return Isleader ();
Code @4: If the current node is not the primary node, wait for the Shard to end. Whether the Shard ends is based on the presence or ${namespace}/jobname/leader/sharding/of the ${namespace}/jobname/leader/sharding/necessary node The processing node exists (indicates that a shard operation is being performed), and if the Shard is not closed, use the Thread.Sleep method to block 100 millimeters and try again.
Code @5: Can enter here, indicating that the node is the primary node. The primary node waits for the batch to complete before executing the Shard, and to determine whether any other task is running is to determine if there is a ${namespace}/jobname/sharding/{shard item}/running, if present, Then use Thread.Sleep (100) and then judge.
Code @6: Creates a temporary node ${namespace}/jobname/leader/sharding/processing node, indicating that the Shard is executing.
Code @7: Resets shard information. Delete the ${namespace}/jobname/sharding/{shard item}/instance node First, and then create the ${namespace}/jobname/sharding/{Shard Item} node (if necessary). Then, depending on the total number of shards currently configured, if the current number of ${namespace}/jobname/sharding child nodes is greater than the number of configured shard nodes, the redundant nodes (from large to small) are removed.
Code @8: Gets the configured Shard algorithm class, the commonly used shard algorithm is the average shard algorithm (averageallocationjobshardingstrategy).
Code @9: creates the corresponding Shard instance information within a transaction ${namespace}/jobname/{the Shard Item}/instance, the node holds the ID of the Jobinstance instance.
Performing transactional operations in ZK: Jobnodestorage#executeintransaction
/**
* Operations are performed in the transaction.
*
* @param callback Perform action callback *
/public void executeintransaction (final transactionexecutioncallback callback) {
try {
curatortransactionfinal curatortransactionfinal = Getclient (). Intransaction (). Check (). Forpath ("/"). and (); @1
Callback.execute (curatortransactionfinal); @2
curatortransactionfinal.commit (); @3
//checkstyle:off
} catch (Final Exception ex) {
//checkstyle:on
Regexceptionhandler.handleexception (ex);
}
}
Code @1, using the Curatorframeworkfactory intransaction () method, Cascade calls Check (), and finally returns the Curatortransactionfinal instance through the and () method, All update node commands in the transaction are executed by the instance. Then execute the commit () command to commit uniformly (the method can guarantee that either all succeeds or all fails).
Code @2, executes the specific logic through the callback Persistshardinginfotransactionexecutioncallback method.
Code @3, committing the transaction.
Code See Shardingservice$persistshardinginfotransactionexecutioncallback
class Persistshardinginfotransactionexecutioncallback implements
Transactionexecutioncallback {Private final map<jobinstance, list<integer>> shardingresults;
@Override public void Execute (final curatortransactionfinal curatortransactionfinal) throws Exception { For (Map.entry<jobinstance, list<integer>> entry:shardingResults.entrySet ()) {for (int sh ArdingItem:entry.getValue ()) {curatortransactionfinal.create (). Forpath (Jobnodepath.getfullpath (Shar Dingnode.getinstancenode (Shardingitem)), Entry.getkey (). Getjobinstanceid (). GetBytes ()). and (); @1}} curatortransactionfinal.delete (). Forpath (Jobnodepath.getfullpath (Shardi ngnode.necessary)). and (); @2 curatortransactionfinal.delete (). Forpath (Jobnodepath.getfullpath (shardingnode.processing)). and (); @3}}
Code @1: The so-called Shard, the main is to create ${namespace}/jobname/sharding/{shard Item}/instance, the node content is the Jobinstance ID.
Code @2: Delete the ${namespace}/jobname/leader/sharding/necessary node.
Code @3: Deletes the ${namespace}/jobname/leader/sharding/processing node, indicating the end of the Shard. The
follows a shard flowchart to end the narration in this section: