Reliability in Storm

Source: Internet
Author: User
Tags emit

We know that storm has a very important feature, which is that the Storm API ensures that one of its tuples can be fully processed, which is especially important, in fact, the reliability of storm is done by spout and bolt components together. The following is from the spout and bolt two convenient to introduce you to the reliability of storm, and finally give a realization of the reliability of the example. Reliability Assurance for 1.Spoutin storm, message processing reliability starts with spout. Storm in order to ensure that the data can be processed correctly, for every tuple,storm generated by spout can be tracked, which involves the processing of ack/fail, if a tuple is processed successfully, then spout will call its Ack method, if it fails, The Fail method is called. Each bolt that handles a tuple in topology will tell Storm through Outputcollector that the current bolt processing is successful. we know that spout must be able to track all the tuples or its sub-tuples that it emits, and to be able to re-send these tuples processing failures. So how does spout track a tuple? Storm is implemented through a simple anchor mechanism (as described in bolt reliability below).      as shown, the solid line represents the root tuple of the spout emission, and the dashed lines represent the sub-tuples from the root tuple. This figure is a tupletree. In this tree, all bolts will either ACK or fail a tuple, and if all bolts in the tree are ACK through its tuple, then the spout Ack method is called, indicating that the entire message is processed. If any of the bolts in the tree fail a tuple, or the entire process times out, the spout fail method is called. another point, storm just through the ack/fail mechanism to tell the application of the middle of the bolt processing, the success/failure of how to deal with, must be determined by the application itself, because there is no data inside storm to save the failure, but there is a way to know the failure record, Because the Ack/fail method of spout is accompanied by a MsgId object, we can set the MsgId to tuple when the tuple is initially launched, and then process the tuple in Ack/fail. In fact, there is a problem, that is, each bolt after execution to explicitly call Ack/fail, otherwise the tuple does not release cause oom. I wonder why Storm did not set the ACK of the bolt to the default call when it was originally designed. Storm's Ispout interface defines three reliability-related methods: Nexttuple,ack and fail.
Public interface Ispout extends Serializable {           void open (Map conf, topologycontext context, Spoutoutputcollector coll Ector);           void Close ();           void Nexttuple ();           void Ack (Object msgId);           void Fail (Object msgId);    }
we know that when Storm spout launches a tuple, he calls the Nexttuple () method, in which the first step in ensuring the reliability process is to assign a unique ID to the emitted tuple and pass this ID to the emit () method:
Collector.emit (New Values ("Value1", "value2"), msgId);
The purpose of assigning a tuple a unique ID is to tell Storm,spout that the tuple tree produced by this tuple will be notified after the processing has completed or failed, and that if the tuple is processed successfully, the spout Ack () method is called. Conversely, if processing fails, spout's fail () method is called and the tuple ID is passed in both methods. It is important to note that although spout has a reliability mechanism, whether this mechanism is enabled is controlled by us. Ibasicbolt automatically calls the ACK () method after a tuple is emit to implement a relatively simple calculation. If it is Irichbolt, if you want to implement anchor, you must call the Ack method yourself. reliability in the 2.Boltthe reliability of bolts is mainly achieved by two steps:
      1. Anchor the original tuple while transmitting the derived tuple
      2. Do ack or fail processing for each tuples
anchor a tuple means that an association is established between the input tuple and its derived tuple, and the associated tuple joins the tuple tree. We can anchor a tuple in the following ways:
Collector.emit (tuple, new Values (word));
If we launch a new tuple without sending a tuple at the same time, then the newly launched tuple will not participate in the entire reliability mechanism, and their fail will not cause the root tuple to re-send, we become unanchor:
Collector.emit (New Values (word));
ack and fail operation methods for a tuple: 
this. Collector.ack (tuple); this. Collector.fail (tuple);
as mentioned above,Ibasicbolt implementation class does not care about ack/fail, spout ack/fail entirely by the back of the Bolt ack/fail to decide. The Basicoutputcollector parameter of its Execute method also does not provide a Ack/fail method to call you. The equivalent of the Ack/fail line that ignores the bolt . In the Irichbolt implementation class, if Outputcollector.emit (oldtuple,newtuple) is used to emit a tuple (anchoring), then the ack/fail of the subsequent bolt will affect spout Ack/fail, if Collector.emit (newtuple) launches a tuple (called Anchoring in Storm), it is equivalent to disconnecting the ack/of the back bolt The effect of fail on spout. Spout will immediately decide to invoke spout Ack/fail based on the ack/fail in front of the current bolt. So the success of bolts behind a bolt doesn't concern you, and you can ignore it this way. One of the bolts in the middle is fail, which does not affect the subsequent bolt execution, but immediately triggers the spout fail. The equivalent of a short-circuit, the back bolt although also carried out, but ack/fail to spout has no meaning. That is, if any of the bolt sets fail, the spout fail method is immediately triggered. The Ack method requires that all bolt calls be ack to be triggered. so Ibasicbolt used to do filter or simple calculation is more appropriate. 3. SummaryStorm's reliability is determined by spout and bolt, and Storm uses the anchor mechanism to ensure the reliability of the process. If a tuple emitted by spout is fully processed, then the ACK method of spout is called, and if it fails, its fail method is called. In bolts, by anchor a tuple in emit (oldtuple,newtuple), if the processing succeeds, you need to call the Ack method of the bolt and, if it fails, call its Fail method. A tuple and its sub-tuple together constitute a tupletree, and the ack of spout is called when all the tuples in the tree are completed within a specified time, but when any tuple in the tree fails, the spout fail method is called.      the Ibasicbolt class automatically calls the Ack/fail method, andIrichbolt requires us to call the Ack/fail method manually. We can passThe Topology_message_timeout_secs parameter specifies the processing completion time for a tuple, and if the time is not processed, spout also calls the Fail method. 4. A reliable example of WordCounta spout to achieve reliability:
 public class Reliablesentencespout extends Baserichspout {private static final long serialversionuid = 1L;     Private Concurrenthashmap<uuid, values> pending;     Private Spoutoutputcollector collector;  Private string[] sentences = {"My dog has fleas", "I like cold beverages", "the dog ate My homework", "don ' t had a cow     Man "," I don ' t think I like Fleas "};     private int index = 0;      public void Declareoutputfields (Outputfieldsdeclarer declarer) {declarer.declare (The new fields ("sentence"));  } public void open (Map config, topologycontext context, Spoutoutputcollector collector) {this. Collector           = Collector; This.      Pending = new Concurrenthashmap<uuid, values> ();          public void Nexttuple () {Values values = new values (sentences[index]); UUID msgId = uuid.           Randomuuid (); This.           Pending.put (msgId, values); This.           Collector.emit (values, msgId);           index++; If(Index >= sentences. Length)          {index = 0;      }//utils.waitformillis (1);      public void Ack (Object msgId) {this. pending.remove (msgId);      ' public void Fail ' (Object msgId) {this. Collector.emit (this. Pending.get (msgId), msgId); } }
A bolt that implements reliability:
public class Reliablesplitsentencebolt extends Baserichbolt {     private outputcollector collector;     public void prepare (Map config, topologycontext context, Outputcollector collector) {this           . Collector = Collector;
   } public     void execute (tuple tuple) {          String sentence = Tuple.getstringbyfield ("sentence");          string[] Words = Sentence.split ("");           for (String word:words) {this               . Collector.emit (tuple, new Values (word));          }           This. Collector.ack (tuple);      }     public void Declareoutputfields (Outputfieldsdeclarer declarer) {          Declarer.declare (new fields ("word"));}      }



 

Reliability in Storm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.