Once the Shard Coordinator (equivalent to the zookeeper of a distributed system) starts, it initiates a timer that attempts to balance the load of each node in the cluster at a certain time, and the balance is to move the heavier actors to the less-loaded nodes. At this point, my previous understanding is wrong, I think Shardregion is the smallest unit of movement.
Val rebalancetask = Context.system.scheduler.schedule (Rebalanceinterval, Rebalanceinterval, self, Rebalancetick)
When coordinator receives Reblancetick, it begins to try to balance the system load
Case Rebalancetick⇒ if (persistentState.regions.nonEmpty) { val shardsfuture = allocationstrategy.rebalance ( Persistentstate.regions, rebalanceinprogress) shardsfuture.value Match {case Some (Success (shards)) ⇒ Continuerebalance (shards) case _⇒ //Continue when the future is completed Shardsfuture.map {shards⇒rebalance Result (shards) }.recover {case _⇒rebalanceresult (set.empty) }.pipeto (self) }
I understand the above logic, but the use of the future is not clear. As a general rule, when Shardsfuture returns to Failure, the Rebalanceresut (set.empty) should be executed directly. Pipeto (self), do not know why after the failure to try to wait for the future
Allocationstrategy provides a default implementation, or you can customize the load balancing policy. The rebalance function returns Set (Shardid), which is the shards to be moved
When coordinator receives Rebalanceresult, it starts to start balance logic
def continuerebalance (Shards:set[shardid]): Unit = Shards.foreach {shard⇒ if (!rebalanceinprogress (shard)) { PersistentState.shards.get (Shard) match {case Some (rebalancefromregion) ⇒ rebalanceinprogress + = Shard log.debug ("Rebalance shard [{}] from [{}]", Shard, Rebalancefromregion) context.actorof ( Rebalanceworkerprops (Shard, Rebalancefromregion, Handofftimeout, PersistentState.regions.keySet + +) persistentstate.regionproxies) . Withdispatcher (context.props.dispatcher)) Case none⇒ log.debug (" Rebalance of Non-existing shard [{}] is ignored ", Shard)}}}
Rebalanceinprocess is a Set that records the Shard that is being moved, and I think that at the start of a new round of balance, the case of rebalanceinprocess is only going to happen when the last balance was not finished. Do not know this time, should be an error or continue to balance better, because balancestrategy should not consider it to the previous round balance have not finished this possibility.
Then, coordinator started Rebalanceworker, the Avatar actor mentioned in the previous article.
private[akka] class Rebalanceworker (shard:string, From:actorref, Handofftimeout:finiteduration, Regions:set[actorref]) extends Actor {import Int Ernal._ Regions.foreach (_! Beginhandoff (Shard)) var remaining = regions Import Context.dispatcher context.system.scheduler.scheduleOnce (hand Offtimeout, Self, receivetimeout) def receive = {case Beginhandoffack (' Shard ') ⇒remaining-= Sender () if (remaining.isempty) {from! HandOff (Shard) context.become (Stoppingshard, Discardold = True)} case Receivetimeout⇒done (ok = Fal SE)} def stoppingshard:receive = {case shardstopped (shard) ⇒done (ok = true) Case Receivetimeout⇒ Done (ok = false)} def-Done (Ok:boolean): Unit = {context.parent! Rebalancedone (Shard, OK) context.stop (self)}}
The logic of Akka is based on message passing, which is actually very difficult to read. There are a lot of actors involved in the Rebalanceworker run. First of all, coordinator, followed by Shardregion, which is the region where the host is to be migrated shard actor, then the Shard actor itself, and finally all the shardregion in the system, they are also involved in. Writing here, I can't help but put the computer screen up.
1. Rebalanceworker first to all the Shardregion Beginhandoff news, tell everyone, hand off start, and then wait for everyone's reply
2. Shardregion received Beginhandoff, began to update their knowledge base, Hostshardregion and Shardactor memory from their own knowledge base erase
Case Beginhandoff (shard) ⇒ log.debug ("Beginhandoff shard [{}]", shard) if (Regionbyshard.contains (shard)) { val regionref = Regionbyshard (Shard) val updatedshards = Regions (regionref)-Shard if ( Updatedshards.isempty) Regions-= Regionref Else regions = regions.updated (Regionref, updatedshards) Regionbyshard-= Shard } sender ()! Beginhandoffack (Shard)
Finally, send a beginhandoffack message, tell Rebalanceworker to prepare himself (these shardregion will be nothing to do after)
3. Go back to Rebalanceworker, it sends HandOff to tell Host Shard actor shardregion, you can do your own cleanup work. Then set their status to Stoppingshard, waiting for shardstopped message, the source of this message is two, one is hostshardregion, the other is the Shard actor
4. After hostshardregion received the HANDOFF message
Case MSG @ HandOff (shard) ⇒ log.debug ("HandOff shard [{}]", shard) //Must drop requests this came in between the Beginhandoff and now, //because they might being forwarded from other regions and there //is a risk or message re-o rdering otherwise if (Shardbuffers.contains (shard)) { Shardbuffers-= Shard loggedfullbufferwarning = False } if (Shards.contains (shard)) { Handingoff + shards (shard) shards (shard) forward msg } else sender ()! Shardstopped (Shard)
If the hostshardregion no longer contains the Shard actor, then return directly to shardstopped, otherwise HandOff this Set to add Shard actor, and HandOff to shard actor
5. Read the code again and found that the Shard actor and entity actor are two things, Shard actor exists between the entity actor and Shard region
The relationship between entity actor and Shard region is not yet known
def getentity (Id:entityid): Actorref = { val name = Urlencoder.encode (ID, "Utf-8") context.child (name). Getorelse { Log.debug ("starting entity [{}] in Shard [{}]", ID, Shardid) val a = Context.watch (Context.actorof (en Tityprops, name)) Idbyref = idbyref.updated (A, id) Refbyid = refbyid.updated (ID, a) state = State.copy ( State.entities + ID) a } }
Judging from this code, the Shard actor is a one-to-many relationship with the entity actor.
def receivecoordinatormessage (msg:coordinatormessage): Unit = msg Match {case HandOff (' Shardid ') ⇒handoff (Sender ()) Case HandOff (Shard) ⇒log.warning ("Shard [{}] can not hand off for another Shard [{}] ", Shardid, Shard) Case _⇒unhandled (msg)} def handOff (replyto:actorref): Unit = Handof Fstopper Match {case Some (_) ⇒log.warning ("HandOff shard [{}] received during existing HANDOFF", Shardid) Case Non E⇒log.debug ("HandOff shard [{}]", Shardid) if (state.entities.nonEmpty) {handoffstopper = Some (context . Watch (Context.actorof (Handoffstopperprops (Shardid, ReplyTo, Idbyref.keyset, Handoffstopmessage)))//dur ing hand off we only care about watching for termination of the hand off stopper context become {case Ter minated (ref) ⇒receiveterminated (ref)}} else {ReplyTo! Shardstopped (shardid) Context Stop self}}
def receiveterminated (ref:actorref): Unit = {
if (handoffstopper.exists (_ = = ref))
Context Stop Self
else if (Idbyref.contains (ref) && Handoffstopper.isempty)
Entityterminated (ref)
}
From this code, the Shard actor's relationship to the entity actor is one-to-two, because when entity stop self, The Shard actor will stop self. This reminds me of Coursera reactive programming's last homework question, why is it similar to an entity has a shard actor corresponding.
Akka cluster sharding Source code learning (1/5) handle off