Preface
Article title: The neophyte's Guide to Scala part 15:dealing with Failure in Actor Systems, Author: Daniel westheide, original link: Http://dan ielwestheide.com/blog/2013/03/20/ The-neophytes-guide-to-scala-part-15-dealing-with-failure-in-actor-systems.html, this article is the 15th in a series of articles written by the author of the Scala Beginner's Guide , is an introductory introduction to Scala's actor programming model and Akka, which, after obtaining the author's authorization, is published in Chinese on my technical blog. This text link: http://blog.csdn.net/bluishglc/article/details/53195602 reprint please indicate the source.
Related reading: Actor-based concurrency Scenarios | A preliminary, trivial note about Akka.
Catalogue
Actor Hierarchy Guardian actor actor path an actor hierarchy example crashes how to do the regulatory policy directive default policy actor life cycle actor's demise self-healing system Error kernel mode timeout death WATC H Summary
Body
In the previous article, I introduced the second cornerstone of Scala's concurrent processing: The Actor model, which complements the Future-promise model. You've learned how to define and create actors, how to send messages to them, and how the actor handles messages, such as a message that modifies its internal state or asynchronously sends a reply message to the sender.
In order to stimulate your interest in the Actor model, we have overlooked some important concepts before you are ready to develop a complex actor-based application before you have to understand them.
The actor model is designed to help you improve the fault tolerance of your system, and in this article we will learn how the actor-based system can handle error, which is completely different from the traditional layered server architecture.
Error handling is closely related to some of the core concepts in Akka, so this article will first introduce the relevant concepts and components. Hierarchy of actor
To figure out what happens when an actor's internal error occurs, it is important to understand the importance of the actor concurrency scenario, which is the basis of akka allowing us to create fault-tolerant concurrent applications: Actors are organized in a hierarchical structure.
What do you mean? First, each actor must have a parent actor (which is also referred to as the parent node), and it can also create its own sub-actor (also called a child node later). You can think of the actor system as a pyramid of actors, and the parent actor will oversee its sub-actors, just like in real life, when a child is in a toddler and the parents are going to help them get their feet together, you'll see how this happens. Guardian Actor
In the previous article, I only introduced two actors, one barista, and one customer. I will not repeat their implementation details, but will focus on how to instantiate them:
Import Akka.actor.ActorSystem
val system = Actorsystem ("coffeehouse")
val barista = system.actorof (props[ Barista], "Barista")
val customer = system.actorof (Props (Classof[customer], Barista), "customer")
As you can see, we created two actor instances through the Actorsystem Actorof method, who is the parent actor of the two actors. Is this the actor system? No, but it's close. The actor system itself is not an actor, but it has a so-called "Guardian" actor (Guardian actor) that acts as a common parent actor for all user-defined top-level actors, For example here we create barista and customer through the Actorof method.
The Guardian actor should not be the direct parent of all actors in the system, we should have the top actor create child nodes and delegate tasks to them. actor Path
When we look at the path of the actor, the hierarchy of the actor system becomes very clear. The URLs of these actors can clearly describe their location. You can get the path to an actor by calling Actorref's path method.
Barista.path//= Akka.actor.ActorPath = Akka://coffeehouse/user/barista
customer.path//= = Akka.actor.ActorPath = Akka://coffeehouse/user/customer
The actor's path uses the format defined by Akka: The Akka protocol begins, then the name of your actor system, followed by the Guardian
The actor's name (that is, user), and then the actor's name when the actor was created by the Actorof method. For remote actors, because they run on different machines, you also need to specify the host address and port on the path.
The actor path can be used to find actors. For example, unlike a reference to a constructor that gives a customer a barista, a customer actor can obtain a barista reference from the Actorcontext actorselection method through a relative path:
Context.actorselection (".. /barista ")
Finding an actor by path is always useful, which is better than passing a dependency on a constructor. In the actor system, the more dense the dependency, the more prone to problems, and it is difficult to refactor in the future. An example of an actor hierarchy
To illustrate how the parent actor oversees the sub-actors and how to ensure system fault tolerance, let's go back to the café example and give barista a sub-actor to share some of the cafes ' operations.
If we were really going to model the barista's work, we would probably design more roles for a finer division, but in order not to lose focus, we had to simplify our example as much as we could.
Assuming the barista now has a register (cash register), which handles transactions, prints receipts, and totals sales, here's the first version:
Import Akka.actor._
object Register {
sealed trait article case
Object Espresso extends article
case Object Cappuccino extends article Case
class Transaction (article:article)
}
class Register extends Actor { C7/>import register._
Import barista._
var revenue = 0
val prices = Map[article, Int] (Espresso, Cap Puccino,
def receive = {case
Transaction (article) + val Price
= Prices (article)
Sender! Createreceipt (Price)
revenue + = Price
}
def createreceipt (price:int): Receipt = Receipt (price)
}
It has a price list, is a map, and it has an integer variable representing sales, whenever it receives a "trade" message, it accumulates the price of the coffee into the sales, and prints a receipt to the sender of the message, the customer.
Register is a child node of barista, which means that we do not create it directly through the actor system, but instead create it inside barista, the code is as follows:
Object Barista {Case
object espressorequest case
Object Closingtime case
class Espressocup (state: espressocup.state)
Object Espressocup {
sealed trait State case object Clean extends state case
Object Filled extends state case
Object Dirty extends State
} case
class Receipt (Amount:int)
}
Class Barista extends Actor {
import barista._
import register._
import espressocup._
Import Context.dispatcher
Import akka.util.Timeout
import akka.pattern.ask
import akka.pattern.pipe
Import concurrent.duration._
implicit val timeout = timeout (4.seconds)
val Register = context.actorof (props[ Register], "register")
def receive = {case
espressorequest =
val receipt = Register? Transaction (Espresso)
Receipt.map ((Espressocup (Filled), _)). Pipeto (sender) Case
Closingtime = Context.stop (self)
}
}
First, we define the types of messages that barista can handle. We also introduced the coffee cup (Espressocup), which has only three finite states, so its state is sealed.
The more interesting part is the implementation of the Barista class, which is necessary to import dispatcher, ask, and pipe, and to declare implicit variable timeout, because in the partial function reveive, We need to use Akka's ask mechanism: when the barista (Barista) receives an espresso order (espressorequest), it generates a trade (Transaction), sends it to the register, and the latter generates a receipt (Receipt), The barista then gives the returned receipt to the customer with a cup of made coffee, and the customer receives a tuple (Espressocup, Receipt). Delegating tasks like this to child nodes and then aggregating them based on their return results is typical in actor-based applications.
Also, do not forget that we are not using Actorsystem but the Actorcontext Actorof method to create the sub-actor, only to do so, the actor is created to call this method actor's child actor, This is different from the creation of the top-level actor for the parent node, the Guardian actor.
Finally, as the actor at the top level, our customer is also the child node of the Guardian actor, which is this:
Object Customer {Case
object caffeinewithdrawalwarning
}
class Customer (COFFEESOURCE:ACTORREF) extends Actor with actorlogging {
import customer._
import barista._
import espressocup._
def receive = {
Case caffeinewithdrawalwarning = coffeesource! Espressorequest Case
(Espressocup (Filled), Receipt (amount)) =
Log.info (S "Yay, caffeine for ${self}!")
}
}
Put aside the barista actor hierarchy, let's take a look at the first appearance of the actorlogging trait, which adds a log function to our program, no longer just printing messages to the console. Now, let's build our actor system with a barista and two customer, and they should work well together:
Import Customer._
val system = Actorsystem ("coffeehouse")
val barista = system.actorof (Props[barista], " Barista ")
val customerjohnny = system.actorof (Props (Classof[customer], Barista)," Johnny ")
Val Customeralina = System.actorof (Props (Classof[customer], barista), "Alina")
Customerjohnny! Caffeinewithdrawalwarning
Customeralina! Caffeinewithdrawalwarning
Try it out and you'll get a log message from two satisfied customers. collapsed.
Of course, what we are really interested in is not whether the customer is satisfied, but what the problem will be.
Our cash register is a fragile device, its printing function is not stable, the paper will often get stuck and can not print receipts. We add a paperjamexception exception to the associated object in the register to represent the problem we are talking about:
Class Paperjamexception (msg:string) extends Exception (msg)
Then, change the Createreceipt method of register accordingly:
def createreceipt (price:int): Receipt = {
import util. Random
if (Random.nextboolean ())
throw new Paperjamexception ("OMG, not again!")
Receipt (price)
}
Now, when dealing with a transaction, half of the probability of a cashier's chance throws a Paperjamexception exception. What impact does this have on our actor system or application? Fortunately, Akka is very robust, and it is not affected by the throw exception, but the offending node notifies its parent node of the abnormal behavior. Remember, the parent node is always monitoring its child nodes, which is when the parent node is required to decide how to handle the problem. Regulatory Strategy
The work of handling the sub-actor exception is not the responsibility of the parent actor's receive bias function, because it disrupts the parent actor's own business logic, which is clearly separated from each other. Each actor defines its own "regulatory strategy" (Supervisor Strategy), which tells Akka how it intends to respond to certain types of errors when its sub-actors occur.
There are two basic regulatory strategies: Oneforonestrategy and Allforonestrategy, which means that you only deal with errors in your sub-actor, and only affect the sub-actor in error, which affects all sub actors. Which strategy is better depends on the scenario your application faces.
Whatever supervisorstrategy you choose for your actor, you have to specify a decider, which is also a partial function: Partialfunction[throwable, Directive], Decider allows you to match certain subtypes of the throwable and then decide what to do with the child actor (or all the sub-actors, if you choose All-for-one Strategy) for the problem. instruction
The following are some of the instructions available to Akka when an error occurs:
Sealed trait Directive Case
Object Resume extends Directive case
Object Restart extends Directive case
Object Stop extends Directive case
object escalate extends Directive
Resume: If you choose Resume, it means that you think your sub-actor is not a big problem, its abnormal situation will not have a big impact, you decide to let the child actor resume message processing.
Restart: The restart instruction will let Akka create a new instance of your child actor. The reason for this is that you assume that the state of the child actor or child actor is unable to process the message after it crashes in some way. You want to get it back to its original state by restarting the actor.
Stop: Stops directly, no longer restarts.
escalate: If you choose escalate, it may be because you don't know how to handle the error at the moment, and you give your parent actor a decision about how to handle the error. If the actor is escalate, it should also be ready to be restarted by the parent actor.
Default Policy
You don't need to specify a regulatory policy for every actor you have. In fact, we have not done this, because the default regulatory strategy will always be in effect. It looks like this:
Final Val defaultstrategy:supervisorstrategy = {
def defaultdecider:decider = {case
_: Actorinitializationexce Ption⇒stop Case
_: actorkilledexception ⇒stop Case
_: Exception ⇒restart
}
Oneforonestrategy () (Defaultdecider)
}
This means that, in addition to the actorinitializationexception and actorkilledexception exceptions, the abnormal child actors will be restarted. Therefore, when the register throws a Paperjamexception exception, its parent Actorbarista's regulatory strategy will cause the register to restart because we have not covered the default regulatory policy. If you try this, you'll get an unusual stacktrace in the log, but there's no information about the register restart. Let's verify that the register is actually restarted, and to do so, we need to learn the actor's declaration cycle first. the life cycle of the actor
To understand what the directives of the regulatory strategy actually do, we need to know some of the actor lifecycle-related knowledge. Basically it boils down to this: when created by Actorof, the actor is in the started state, and then, if something happens, it is restart any number of times, and finally, an actor is stopped and then dies. There are several callback methods that can be overridden on the actor's life cycle, but it is also important to understand their default implementations. Let's go after it: Prestart: An actor will callback the method after it is started to let you do some initialization work. The default implementation is empty. poststop: After stop is invoked, this method is called back to do some resource release work, and the default implementation is empty. Prerestart: A collapsed actor will callback this method before restarting, its default implementation is to stop all child actors, and then call the Poststop method to release resources. Postrestart: An actor will immediately callback this method after the reboot is complete, and its default implementation is to call Prestart.
Let's add some log output to the register's Postrestart method to see if it really restarts after the failure. To do this, we let register inherit the actorlogging trait, and then add the following method:
Override Def Postrestart (reason:throwable) {
super.postrestart (reason)
Log.info (S "restarted because of ${ Reason.getmessage} ")
}
Now, if you send a bunch of caffeinewithdrawalwarning messages to the two customer, you'll see one or the other in the log confirming that our register did reboot. the death of the actor
Sometimes it doesn't make sense to restart an actor repeatedly, for example: An actor communicates over the network and other services, and the service may be temporarily unavailable for a period of time. In this case, a good solution is to tell Akka how long it should take to restart the actor, and if it exceeds the time limit, stop the actor and cause the actor to die. This throttling time can be configured in the monitoring policy's constructor:
Import scala.concurrent.duration._
import akka.actor.OneForOneStrategy
Import Akka.actor.SupervisorStrategy.Restart
Oneforonestrategy (2.minutes) {case
_ = Restart
}
"Self-healing" system
At this point, our system is running smoothly. If the paper stuck problem happens again, will it automatically recover? Let's change the log output:
Override Def Postrestart (reason:throwable) {
super.postrestart (reason)
Log.info (S "restarted, and revenue is $ Revenue cents ")
}
Also add some log printing to the Receive offset function to make it this way:
def receive = {case
Transaction (article) +
val Price = Prices (article)
Sender! createreceipt C12/>revenue + = Price
Log.info (S "revenue incremented to $revenue cents")
}
There is a problem in the log, you can see the sales have been increasing, but as long as the paper stuck, the register restart, the sales will be cleared, because the restart means that the old instance will be discarded, in turn to create a new instance. Of course, we can modify barista's regulatory strategy and let register simply "recover" the register when a paperjamexception exception occurs:
Val decider:partialfunction[throwable, Directive] = {Case
_: paperjamexception = Resume
}
override Def Supervisorstrategy:supervisorstrategy =
oneforonestrategy () (Decider.orelse ( SupervisorStrategy.defaultStrategy.decider))
Now, the register will not restart after the paperjamexception exception, so its status will not be reset. Error kernel mode
Sometimes simply recovering an actor might be the best thing to do, but what if you don't restart it and you can't solve the problem? Let's say we don't have a way to completely erase the jammed paper without restarting the register, so what do we do? In order to simulate this, we need to set a Boolean flag to mark whether the register is already stuck (the translator note: The following code introduces the variable Paperjam to keep the register "stuck" when the register is stuck in the paper, This variable will be permanently set to true, no more chance of being set to false, unless the actor is restarted, we'll change the Register code:
class Register extends Actor with actorlogging {import Register._ import barista._ var rev Enue = 0 Val prices = map[article, Int] (Espresso, cappuccino) var Paperjam = False Override Def
Postrestart (reason:throwable) {super.postrestart (reason) Log.info (S "restarted, and revenue is $revenue cents") } def receive = {case Transaction (article) + Val price = prices (article) Sender! Createreceipt (pr ICE) Revenue + = Price Log.info (S "revenue incremented to $revenue cents")} def createreceipt (price:int): Receipt = {Import util.
Random if (Random.nextboolean ()) Paperjam = True if (paperjam) throw new Paperjamexception ("OMG, not again!") Receipt (Price)}}
At the same time, we remove the regulatory policy assigned to barista (using the default policy). Now, if you do not restart the register, the paper stuck problem will persist, but the restart will lose important sales data. This is the time for the error kernel mode to come in handy. It's basically a simple guideline, and you'd better always try to obey: If an actor is to maintain an important internal state, it should try to delegate the dangerous work to its child actor, thus avoiding the loss of its state information in the crash .
Sometimes it is necessary, but not necessary, to create a new sub-actor for each of these tasks. The essence of this pattern is to keep important state information as far as possible on the top or near the top of the actor, to assign the error-prone task as far as possible to the lower actor. Let's apply this pattern to the register, we let register to maintain the sales, but delegate the work of receipt printing to a new sub-actor, we can call it Receiptprinter, here is its specific code:
Object Receiptprinter {Case
class PrintJob (amount:int)
class Paperjamexception (msg:string) extends Exception (msg)
}
Class Receiptprinter extends Actor with actorlogging {
var Paperjam = False
override def Postrestart (Reason:throw Able) {
super.postrestart (reason)
Log.info (S "restarted, paper jam = = $paperJam")
}
def receive = { Case
PrintJob (amount) = sender! Createreceipt (amount)
}
def createreceipt (price:int): Receipt = {
if (Random.nextboolean ()) Paperjam = True
if (paperjam) throw new Pap Erjamexception ("OMG, not again!")
Receipt (Price)
}
}
We have again used a Boolean flag to simulate the paper jam exception, and we have introduced a new message type PrintJob, which can be said that the entire receiptprinter is a sub actor extracted from the register. One advantage of this is that it strips out dangerous, easy-to-fail operations from the stateful register, while also making the code more concise and readable: Receiptprinter is responsible for a single task, which makes the register easier, It is only responsible for managing accounts and delegating tasks to the child actors to:
class Register extends Actor with actorlogging {import akka.pattern.ask import Akka.pattern. Pipe Import Context.dispatcher implicit val timeout = timeout (4.seconds) var revenue = 0 Val prices = map[article, INT] (Espresso, cappuccino) Val printer = Context.actorof (Props[receiptprinter], "printer") Overri De def postrestart (reason:throwable) {super.postrestart (reason) Log.info (S "restarted, and revenue is $revenue CE NTS ")} def receive = {case Transaction (article) + Val price = Prices (article) Val requester = SE NDEr (printer? PrintJob (price)). Maps ((requester, _)). Pipeto (self) case (requester:actorref, receipt:receipt) = revenue + = Receipt.amount Log.info (S "Revenue is $revenue cents") requester! Receipt}}
Instead of creating a receiptprinter for every transaction message, we use the default policing policy to let printer restart when an error is encountered. The reasonable explanation for this seemingly bizarre sales bookkeeping is that we first request a receipt from printer and then split the sender and receipt of the transaction message into a tuple and send it to itself ( Note: The map method of the future accepts a function parameter to perform some sort of processing or conversion of the return result represented by the back, returning a new one. (printer?) PrintJob (price)). Map ((requester, _)). The work done by Pipeto (self) is: first (printer? PrintJob (price)) returns a Future,map function that carries a receipt receipt the receipt together with the sender of the transaction information (the sender is barista) and then sends it to the register himself, The second case statement handles the message that is sent to itself). When we deal with this message to ourselves, we end up with the sum of the consumption and send the receipt to the requester.
The reason we do this in two steps is that we want to be able to book only when the receipt is printed successfully. An important principle is: never change the internal state of an actor in a future, we must use this indirect way, which will ensure that we only modify the sales within the ACOTR, but also that it will not be modified on another thread (the translator notes, This is the most important reason that modifying the actor state outside of the actor can be done on a different thread, because concurrency can cause inconsistencies in the state.
Assigning a Val to Sender is also for a reason: In Map a future, we are no longer in the context of the actor, since sender is a method, the reference that it returns at this time may already be another actor.
Now, our register can be restarted safely.
Of course, the idea of putting the receipt print and the accounting management together is inherently problematic, and the previous one is just to demonstrate the error kernel mode. It is better to separate the two from each other because they are two irrelevant concerns. timed out
Another thing we want to promote is time-out. Now, when an exception occurs in Receiptprinter, it causes a asktimeoutexception exception, because we use the ASK syntax, which is thrown when barista fails to successfully process a future. Since barista will pass the map-transformed future of carrying processing results to the customer, the customer will receive a failure that contains the asktimeoutexception exception. The customer did not ask for anything, so it was not expected to have such a message, in fact, it would not process the message. Let's be friendly and send a comebacklater message to the customer telling them that the coffee will be a little late. It would be much better to do so, or they wouldn't know he might not be able to get the coffee. To achieve this goal, let's recover from the asktimeoutexception error and convert it to a comebacklater message. Barista's partial function reveive looks like this:
def receive = {case
espressorequest =
val receipt = register? Transaction (Espresso)
Receipt.map ((Espressocup (Filled), _)). recover {Case
_: asktimeoutexception = Comebacklater
} pipeto (sender) Case
closingtime = Context.system.shutdown ()
}
Now, the customer knows that they can take a chance, and if they try enough, they end up waiting for the coffee they want. Death Watch
Another important principle to ensure system fault tolerance is to keep a close eye on dependencies that rely on child nodes in reverse. Sometimes, you have actors who rely on other actors, and those actors are not their sub actors, which means that you are not monitoring them, so keep a close eye on the actor's state, and it's important to be notified in time when something bad happens. Imagine that an actor is responsible for accessing the database, and you want actors who depend on the actor to get a good idea of the actor's state, and if it's a problem, you might want to switch your system to maintenance mode. For other occasions, simply using some kind of backup actor as the actor of death is a viable alternative. In any case, we need to monitor the actor you rely on so that you can get the message when it fails. This is done by defining the Watch method on the Actorcontext. To demonstrate this, we have the customer monitor barista, our customers are coffee addicts, so it is reasonable to rely on the barista.
Class Customer (COFFEESOURCE:ACTORREF) extends Actor with actorlogging {
import context.dispatcher
Context.watch (Coffeesource)
def receive = {case
caffeinewithdrawalwarning = coffeesource! Espressorequest Case
(Espressocup (Filled), Receipt (amount)) =
Log.info (S "Yay, caffeine for ${self}!")
Case comebacklater =
log.info ("Grumble, grumble")
context.system.scheduler.scheduleOnce (300.millis) {
Coffeesource! Espressorequest
} case
Terminated (barista)