The ACK mechanism of storm

Last Update:2016-10-28 Source: Internet

Author: User

Tags ack

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The Big brothers who are learning storm, I have come to preach to teach the doubt, whether I think I will use an ACK. Well, then let me start to slap you in the face.

Let's say the ACK mechanism:

To ensure that the data is handled correctly, storm will track every tuple generated by spout.

This involves the processing of ack/fail, if a tuple processing success means that the tuple and all the tuple produced by the tuple are successfully processed, will call the spout Ack method;

If the failure means that a tuple or a tuple in all of the tuples produced by this tuple fails, the fail method of spout is called;

Each bolt that processes a tuple will tell storm through Outputcollector that the current bolt processing is successful.

It is also important to note that when the fail action is triggered by spout, the failed tuple is not automatically re-sent, and we need to retrieve the sending failure data in spout and resend it manually.

ACK principle
There is a special task called Acker in Storm, which is responsible for tracking each tuple's tuple tree emitted by spout (because a tuple is sent through the spout, and after each bolt is processed, a new tuple is generated). When Acker (the framework self-starting task) discovers that a tuple tree has been processed, it sends a message to the task that generated the tuple.
Acker's tracking algorithm is one of the major breakthroughs in storm, and for any large tuple tree, it only needs a constant 20 bytes to track it.
The principle of Acker tracking algorithm: Acker for each spout-tuple to save a ack-val checksum, its initial value is 0, and then each time a tuple or ack a tuple, the ID of this tuple will be different from this check value or a bit, And the resulting value is updated to the new value of Ack-val. Then assuming that each of the sent tuple is ACK, then the last Ack-val value must be 0. Acker is based on whether the ack-val is fully processed, if 0 is considered to be fully processed.

To implement an ACK mechanism:
1,spout Specify MessageID when launching a tuple
2,spout to rewrite the fail and ACK methods of the Baserichspout
3,spout to the emitted tuple cache (otherwise spout Fail method received Acker sent Messsageid,spout also can't get to send failed data to resend), look at the interface provided by the system, only msgid this parameter, where the design is unreasonable , in fact, in the system is a cache of the entire MSG, only to the user a MessageID, the user how to obtain the original MSG seems to need their own cache, and then use this msgid to query, too pit dad
3,spout is removed from the cache queue according to the MessageID for the ACK tuple and can be selected for the fail tuple.
4, set Acker number at least greater than 0;config.setnumackers (conf, ackerparal);

Off Topic:
Ali's own Jstorm will provide

public interface Ifailvaluespout {void fail (Object msgId, list<object>values);}
This makes it more reasonable to get the MSG values of the system cache directly

Storm's bolts have Bsicbolt and Richbolt:
In Basicbolt, Basicoutputcollector is automatically associated with the input tuple when the data is emit, and the input tuple is automatically ack at the end of the Execute method.
When you use Richbolt to emit data, the source tuple that specifies the data is displayed with the second parameter anchor tuple to keep the tracker link, which is Collector.emit (Oldtuple, newtuple); And you need to call Outputcollector.ack (tuple) After execute executes successfully, and execute outputcollector.fail (tuple) when the failure is handled;

The generation of a new tuple by a tuple is called: anchoring, when you launch a tuple, you also complete a anchoring.

The ACK mechanism, that is, every message sent by spout,

Within the specified time, spout receives an ACK response from Acker that the tuple was successfully processed by subsequent bolts

Within the specified time (the default is 30 seconds), does not receive the Acker ACK response tuple, triggers the fail action, that is, the tuple processing failed, TIMEOUT time can be config.topology_message_timeout_ secs to set.

Or receive a acker sent by the fail response tuple, also think of failure, triggering fail action

Note that I started to think that if I inherited Basebasicbolt then the program throws an exception, it will let the spout, but I was wrong, the program directly abnormal stop

Here I use the Distributed program Primer case Worldcount as an example.

Please look at the following big screen: No mistake I'm the name you often hear on the road Liu Yang.

Here I use the Distributed program Primer case Worldcount as an example.

Please look at the following big screen: No mistake I'm the name you often hear on the road Liu Yang.

Here Spout1-1task sends the sentence "I am Liu Yang" to Bolt2-2task, which divides the sentence into words, distributes it to the next bolt according to the field, Bolt2-2,bolt4-4, Bolt5-5 adds a suffix of 1 to each word and sends it to the next bolt for storage to the database, this time bolt7-7task fails to store the data to the database, sends a fail response to spout, and this time spout receives the message and sends the data again.

OK, so I think of a question: spout How to ensure that the data sent again is the previous failure of the data, so in the spout instance, the absolute definition of a map cache, the cache sent every data, key is of course MessageID, When the spout instance receives the response of all bolts, if it is an ACK, it calls our rewritten Ack method, in which we are going to delete this key-value based on MessageID, and if the spout instance receives all the bolt responses, it discovers Faile. The Fail method that we override is called, and the data is sent out again according to the MessageID query to the corresponding data.

The spout code is as follows

public class Myspout extends Baserichspout {private static final long serialversionuid = 5028304756439810609L;    Key:messageid,data private hashmap<string, string> waitack = new hashmap<string, string> ();    Private Spoutoutputcollector collector;    public void Declareoutputfields (Outputfieldsdeclarer declarer) {declarer.declare (The new fields ("sentence")); } public void Open (Map conf, topologycontext context, Spoutoutputcollector collector) {This.collector = Collect    Or        } public void Nexttuple () {String sentence = "I am Liu Yang";        String messageId = Uuid.randomuuid (). toString (). ReplaceAll ("-", "");        Waitack.put (messageId, sentence);    Specify MESSAGEID, open ackfail mechanism collector.emit (new Values (sentence), messageId);        } @Override public void ack (Object msgId) {System.out.println ("Message processing succeeded:" + msgId);        System.out.println ("Delete data from cache ...");    Waitack.remove (MSGID); } @Override Public VoID fail (Object msgId) {System.out.println ("Message processing failed:" + msgId);        System.out.println ("Resend failed message ...");        If the ackfail mechanism is not turned on, then the data in the spout map object will not be deleted.    Collector.emit (New Values (Waitack.get (msgId)), msgId); }}

Although our spout sources are usually sourced from Kafka in the Storm project, and we use the tool class Kafkaspout class that storm provides, this class is actually a collection of defenders <messageId,Tuple> pairs.

What does storm do with duplicate tuple? Because Storm wants to ensure the reliable processing of the tuple, when the tuple processing fails or times out, the spout will fail and resend the tuple, then there will be a tuple repeat calculation problem. The problem is difficult to solve, and Storm does not provide a mechanism to help you solve it. Some feasible strategies: (1) not to deal with, this is also a kind of strategy. Because real-time computing usually does not require very high accuracy, subsequent batch calculations will correct errors in real-time calculations. (2) Use third-party centralized storage to filter, for example, using mysql,memcached or Redis based on the logical primary key to go heavy. (3) Filter with Bloom filter, simple and efficient.

Question one: Have you ever wondered what happens if a tuple that is processed by a task node has failed and the message has been re-sent?

As we all know, spout, as the source of the message, is not deleted until the tuple comes back to the left and right bolts, so if the message fails, it causes the spout node to store more and more tuple data, causing memory overflow.

Question two: Have you ever thought, if the tuple of the many sub-tuple, one of the sub-tuple processing failed, but the other sub-tuple will continue to execute, if the sub-tuple is to perform a data store operation, then even if the entire message failed, Those generated sub-tuples will still execute successfully without rolling back.

Problem three: The tracking of a tuple is not necessarily a response from the spout node to the last bolt, as long as the spout starts, and the bolt stops tracing at any level.

Acker task component To set the number of Acker inside a topology, the default value is one, if you topoogy the number of tuple, then please set the amount of Acker more, more efficient.

Adjust Reliability
Acker Task is very lightweight, so a topology doesn't need a lot of Acker. You can track its performance through the Strom UI (ID:-1) . If its throughput does not look normal, then you need to add more Acker .
If reliability isn't that important to you, —You don't really care about losing some of the data in some failed situations, You can get better performance by not tracking these tuple trees. Not tracking messages can reduce the number of messages in the system by half, because an ACK message is sent for each tuple . And it requires fewer IDs to hold downstream tuples, reducing bandwidth usage.
There are three ways to get rid of reliability. The first is toConfig.topology_ackersSet as0.In this case,StormWill be inSpoutLaunch aTupleCall immediately afterSpoutOfAckMethod. Which means thisTupleTrees are not tracked.
The second method is toTuplelevel to eliminate reliability.You can launch theTupleDo not specify when themessageidspout tuple The last method is if you have a tuple a part of the tree in the end is not very concerned about the success, then you can launch these tupleunanchor them. so these tuple is not tuple Reliability Configuration

There are three ways to get rid of message reliability:

Set the parameter config.topology_ackers to 0, by this method, when the spout sends a message, its Ack method will be called immediately;

Spout The MessageID of this message is not specified when a message is sent. You can use this method when you need to turn off the reliability of a particular message.

Finally, if you do not care about the reliability of a descendant message derived from a message, then the message derived from it is not anchored when it is sent, that is, the input message is not specified in the emit method. Because these descendant messages are not anchored in any tuple tree, their failure does not cause any spout to resend the message.

How to turn off the ACK mechanism

There are 2 ways

spout send data is not brought on MsgId

Set the number of Acker equal to 0

It is important to note that the task of the storm call ACK or fail is always the task that produces the tuple, so if a spout is divided into a number of tasks to execute, the successful failure of the message execution always notifies the first tuple-issuing task.

As a storm user, there are two things you need to do to make better use of the storm's reliability features, first you notify Storm when you generate a tuple, and second, you notify Storm after you have completely processed a tuple. This allows storm to detect that the entire tuple tree has not finished processing and notifies the source spout to process the results.

1 because the corresponding task was hung up, a tuple was not ACK:

The storm's timeout mechanism marks the tuple as failed after it has timed out, allowing it to be re-processed.

2 Acker hung up: In this case, all spout tuples tracked by this Acker will have a timeout and will be processed again.

3 Spout hung up: In this case the message source sending the message to Spout is responsible for resending the messages.

Three basic mechanisms ensure that storm is fully distributed, scalable, and highly fault-tolerant.

In addition, the ACK mechanism is often used to limit the flow : In order to avoid spout sending data too fast, and bolt processing is too slow, often set pending number, When a tuple of spout equals or exceeds the number of pending does not receive an ACK or fail response , the execution nexttuple is skipped to restrict spout from sending the data.

Set the number of SPOUT pend by Conf.put (config.topology_max_spout_pending, PENDING).

The ACK mechanism of storm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More