Akka cluster sharding Source code learning (1/5) handle off

Source: Internet
Author: User
Tags knowledge base

Once the Shard Coordinator (equivalent to the zookeeper of a distributed system) starts, it initiates a timer that attempts to balance the load of each node in the cluster at a certain time, and the balance is to move the heavier actors to the less-loaded nodes. At this point, my previous understanding is wrong, I think Shardregion is the smallest unit of movement.

Val rebalancetask = Context.system.scheduler.schedule (Rebalanceinterval, Rebalanceinterval, self, Rebalancetick)

When coordinator receives Reblancetick, it begins to try to balance the system load

Case Rebalancetick⇒      if (persistentState.regions.nonEmpty) {        val shardsfuture = allocationstrategy.rebalance ( Persistentstate.regions, rebalanceinprogress)        shardsfuture.value Match {case          Some (Success (shards)) ⇒            Continuerebalance (shards) case          _⇒            //Continue when the future is completed            Shardsfuture.map {shards⇒rebalance Result (shards)            }.recover {case              _⇒rebalanceresult (set.empty)            }.pipeto (self)        }      

I understand the above logic, but the use of the future is not clear. As a general rule, when Shardsfuture returns to Failure, the Rebalanceresut (set.empty) should be executed directly. Pipeto (self), do not know why after the failure to try to wait for the future

Allocationstrategy provides a default implementation, or you can customize the load balancing policy. The rebalance function returns Set (Shardid), which is the shards to be moved

When coordinator receives Rebalanceresult, it starts to start balance logic

def continuerebalance (Shards:set[shardid]): Unit =    Shards.foreach {shard⇒      if (!rebalanceinprogress (shard)) {        PersistentState.shards.get (Shard) match {case          Some (rebalancefromregion) ⇒            rebalanceinprogress + = Shard            log.debug ("Rebalance shard [{}] from [{}]", Shard, Rebalancefromregion)            context.actorof ( Rebalanceworkerprops (Shard, Rebalancefromregion, Handofftimeout,              PersistentState.regions.keySet + +) persistentstate.regionproxies)              . Withdispatcher (context.props.dispatcher)) Case          none⇒            log.debug (" Rebalance of Non-existing shard [{}] is ignored ", Shard)}}}    

Rebalanceinprocess is a Set that records the Shard that is being moved, and I think that at the start of a new round of balance, the case of rebalanceinprocess is only going to happen when the last balance was not finished. Do not know this time, should be an error or continue to balance better, because balancestrategy should not consider it to the previous round balance have not finished this possibility.

Then, coordinator started Rebalanceworker, the Avatar actor mentioned in the previous article.

private[akka] class Rebalanceworker (shard:string, From:actorref, Handofftimeout:finiteduration, Regions:set[actorref]) extends Actor {import Int Ernal._ Regions.foreach (_! Beginhandoff (Shard)) var remaining = regions Import Context.dispatcher context.system.scheduler.scheduleOnce (hand        Offtimeout, Self, receivetimeout) def receive = {case Beginhandoffack (' Shard ') ⇒remaining-= Sender () if (remaining.isempty) {from! HandOff (Shard) context.become (Stoppingshard, Discardold = True)} case Receivetimeout⇒done (ok = Fal  SE)} def stoppingshard:receive = {case shardstopped (shard) ⇒done (ok = true) Case Receivetimeout⇒ Done (ok = false)} def-Done (Ok:boolean): Unit = {context.parent! Rebalancedone (Shard, OK) context.stop (self)}} 

The logic of Akka is based on message passing, which is actually very difficult to read. There are a lot of actors involved in the Rebalanceworker run. First of all, coordinator, followed by Shardregion, which is the region where the host is to be migrated shard actor, then the Shard actor itself, and finally all the shardregion in the system, they are also involved in. Writing here, I can't help but put the computer screen up.

1. Rebalanceworker first to all the Shardregion Beginhandoff news, tell everyone, hand off start, and then wait for everyone's reply

2. Shardregion received Beginhandoff, began to update their knowledge base, Hostshardregion and Shardactor memory from their own knowledge base erase

Case Beginhandoff (shard) ⇒      log.debug ("Beginhandoff shard [{}]", shard)      if (Regionbyshard.contains (shard)) {        val regionref = Regionbyshard (Shard)        val updatedshards = Regions (regionref)-Shard        if ( Updatedshards.isempty) Regions-= Regionref        Else regions = regions.updated (Regionref, updatedshards)        Regionbyshard-= Shard      }      sender ()! Beginhandoffack (Shard)

Finally, send a beginhandoffack message, tell Rebalanceworker to prepare himself (these shardregion will be nothing to do after)

3. Go back to Rebalanceworker, it sends HandOff to tell Host Shard actor shardregion, you can do your own cleanup work. Then set their status to Stoppingshard, waiting for shardstopped message, the source of this message is two, one is hostshardregion, the other is the Shard actor

4. After hostshardregion received the HANDOFF message

Case MSG @ HandOff (shard) ⇒      log.debug ("HandOff shard [{}]", shard)      //Must drop requests this came in between the Beginhandoff and now,      //because they might being forwarded from other regions and there      //is a risk or message re-o rdering otherwise      if (Shardbuffers.contains (shard)) {        Shardbuffers-= Shard        loggedfullbufferwarning = False      }      if (Shards.contains (shard)) {        Handingoff + shards (shard)        shards (shard) forward msg      } else        sender ()! Shardstopped (Shard)

If the hostshardregion no longer contains the Shard actor, then return directly to shardstopped, otherwise HandOff this Set to add Shard actor, and HandOff to shard actor

5. Read the code again and found that the Shard actor and entity actor are two things, Shard actor exists between the entity actor and Shard region

The relationship between entity actor and Shard region is not yet known

def getentity (Id:entityid): Actorref = {    val name = Urlencoder.encode (ID, "Utf-8")    context.child (name). Getorelse {      Log.debug ("starting entity [{}] in Shard [{}]", ID, Shardid)      val a = Context.watch (Context.actorof (en Tityprops, name))      Idbyref = idbyref.updated (A, id)      Refbyid = refbyid.updated (ID, a) state      = State.copy ( State.entities + ID)      a    }  }

Judging from this code, the Shard actor is a one-to-many relationship with the entity actor.

def receivecoordinatormessage (msg:coordinatormessage): Unit = msg Match {case HandOff (' Shardid ') ⇒handoff (Sender ()) Case HandOff (Shard) ⇒log.warning ("Shard [{}] can not hand off for another Shard [{}] ", Shardid, Shard) Case _⇒unhandled (msg)} def handOff (replyto:actorref): Unit = Handof Fstopper Match {case Some (_) ⇒log.warning ("HandOff shard [{}] received during existing HANDOFF", Shardid) Case Non E⇒log.debug ("HandOff shard [{}]", Shardid) if (state.entities.nonEmpty) {handoffstopper = Some (context . Watch (Context.actorof (Handoffstopperprops (Shardid, ReplyTo, Idbyref.keyset, Handoffstopmessage)))//dur ing hand off we only care about watching for termination of the hand off stopper context become {case Ter minated (ref) ⇒receiveterminated (ref)}} else {ReplyTo! Shardstopped (shardid) Context Stop self}} 
def receiveterminated (ref:actorref): Unit = {
if (handoffstopper.exists (_ = = ref))
Context Stop Self
else if (Idbyref.contains (ref) && Handoffstopper.isempty)
Entityterminated (ref)
}

From this code, the Shard actor's relationship to the entity actor is one-to-two, because when entity stop self, The Shard actor will stop self. This reminds me of Coursera reactive programming's last homework question, why is it similar to an entity has a shard actor corresponding.

Akka cluster sharding Source code learning (1/5) handle off

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.