Document directory
- What does it mean for a message to be "fully processed "?
- What happens if a message is fully processed or fails to be fully processed?
- What is Storm's reliability API?
- How do I make my applications work correctly given that tuples can be replayed?
- How does storm implement reliability in an efficient way?
- Tuning Reliability
Https://github.com/nathanmarz/storm/wiki/Guaranteeing-message-processing
Http://xumingming.sinaapp.com/127/twitter-storm%E5%A6%82%E4%BD%95%E4%BF%9D%E8%AF%81%E6%B6%88%E6%81%AF%E4%B8%8D%E4%B8%A2%E5%A4%B1/
This chapter discusses storm's reliability capabilities. How can we ensure that all tuple from spout emit is correctly executed (fully processed )?
What does it mean for a message to be "fully processed "?
The first question is, What Is tuple or message being fully processed? Because tuple may be processed by multi-level bolts after it is released by emit, and bolt may also generate multiple sets of tuples by the tuple, the situation is still complicated.
Eventually, all tuples triggered by a tuple trigger will form a tree or DAG (directed acyclic graph)
Storm considers the tuple to be fully processed only when all nodes on the tuple tree are successfully processed.
If any node in the tuple tree fails or times out, it is considered as the tuple fail.
Storm considers a tuple coming off a spout "fully processed" whenThe tuple tree has been exhausted and every message in the tree has been processed.
A tuple is considered failed when its tree of messages fails to be fully processed within a specified timeout.
This timeout can be configured on a topology-specific basis using the config. topology_message_timeout_secs configuration and defaults to 30 seconds.
What happens if a message is fully processed or fails to be fully processed?
How is this mechanism implemented?
First, all tuple has a unique msgid, which is determined when tuple is emit.
_collector.emit(new Values("field1", "field2", 3) , msgId);
Next, let's take a look at the ispout interface below, except to get the nexttuple of tuple
There are also ack and fail. When Storm detect to tuple is fully processed, Ack is called. If it times out or detect fail, fail is called.
It should be noted that tuple can be ACK or fail only on the generated spout task. The specific reason is as follows:
A tuple will be acked or failed by the exact sameSpout
Task that created it. So ifSpout
Is executing as your tasks before ss the cluster, a tuple won't be acked or failed by a different task than the one that created it.
public interface ISpout extends Serializable { void open(Map conf, TopologyContext context, SpoutOutputCollector collector); void close(); void nextTuple(); void ack(Object msgId); void fail(Object msgId);}
Finally, how to implement spout is actually relatively simple.
For spout queue, get message is only open rather than pop, and the tuple status is changed to pending to prevent the tuple from being sent multiple times.
The tuple will pop the tuple until it is ack. Of course, if the tuple is fail, the status will be changed back to the initial status.
This also explains why tuple can only be ACK or fail in the spout task of emit, because only the queue of this task has the tuple
WhenKestrelSpout
Takes a message off the kestrel queue, it"Opens"The message.
This means the message is not actually taken off the queue yet, but instead placed in"Pending" stateWaiting for acknowledgement that the message is completed.
While in the pending state, a message will not be sent to other consumers of the queue. Additionally, if a client disconnects all pending messages for that client are put back on the queue.
What is Storm's reliability API?
One problem that has not been mentioned earlier is that storm determines whether tuple is successfully fully processed by any mechanism?
There are two problems to solve,
1. How do I know the structure of the tuple tree?
2. How do I know the running status of each node on the tuple tree, success or fail?
The answer is simple. You must tell it. How can you tell it?
1. For the structure of the tuple tree, you need to know which tuple generates each tuple, that is, the link between the Tree nodes.
The link between Tree nodes is called anchoring. Each time a new tuple is created for emit, the anchoring must be explicitly created through APIS.
Specifying a link in the tuple tree is calledAnchoring. Anchoring is done at the same time you emit a new tuple.
Each word tuple isAnchoredBy specifying the input tuple asFirst argumentemit
.
Let's look at the following code example,
_collector.emit(tuple, new Values(word));
The first parameter of emit is tuple, which is used to create anchoring
Of course, you can also directly call the unanchoring emit version. If you do not need to ensure reliable, it will be more efficient.
_collector.emit(new Values(word));
As mentioned above, a tuple may depend on multiple inputs,
An output tuple can be anchored to more than one input tuple.
This is useful when doing streaming joins or aggregations. A multi-anchored tuple failing to be processed will cause multiple tuples to be replayed from the spouts.
List<Tuple> anchors = new ArrayList<Tuple>();anchors.add(tuple1);anchors.add(tuple2);_collector.emit(anchors, new Values(1, 2, 3));
For multi-anchoring, tuple tree is changed to tuple DGA. The current storm version can support Dag well.
Multi-anchoring adds the output tuple into multiple tuple trees.
Note that it's also possible for multi-anchoring to break the tree structure and create tuple dags,
2. For the running status of each node on the tuple tree, you need to explicitly callOutputcollector's ack and fail to report
This is done by usingack
Andfail
Methods onOutputCollector
.
You can usefail
Method onOutputCollector
To immediately fail the spout tuple at the root of the tuple tree.
Let's take a look at the example below, which will be called at the end of the Execute function,
_ Collector. Ack (tuple );
I'm confused. Why is ack?OutputCollector
Instead of tuple's function?
In addition, even if Ack is used, it should be ack for Bolt input. For the reason it is output, it may be because all input is generated by other bolt output... this design is unreasonable.
public class SplitSentence extends BaseRichBolt { OutputCollector _collector; public void prepare(Map conf, TopologyContext context, OutputCollector collector) { _collector = collector; } public void execute(Tuple tuple) { String sentence = tuple.getString(0); for(String word: sentence.split(" ")) { _collector.emit(tuple, new Values(word)); } _collector.ack(tuple); } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); } }
Storm must sacrifice efficiency to ensure reliable. Here, storm records the structure and running status of the tuple tree you report in task memory.
However, a tuple node will be deleted from the memory only after it is ack or fail. If you do not always go to ACK or fail, the task's out of memory
Every tuple you process must be acked or failed. Storm uses memory to track each tuple, so if you don't ack/fail every tuple, the task will eventually run out of memory.
Simple version,
BasicBolt
Upper
This mechanism will put a burden on programmers, especially for many simple cases, such as filter, each time we need to explicitly establish anchoring and ACK...
Therefore, storm provides a simple version that automatically creates an anchoring and automatically calls ack after bolts execute it.
A lot of bolts follow a common pattern of reading an input tuple, emitting tuples based on it, and then ACKing the tuple at the end ofexecute
Method. These bolts fall into the categories of filters and simple functions. Storm has an interface calledBasicBolt
That encapsulates this pattern for you.
public class SplitSentence extends BaseBasicBolt { public void execute(Tuple tuple, BasicOutputCollector collector) { String sentence = tuple.getString(0); for(String word: sentence.split(" ")) { collector.emit(new Values(word)); } } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); } }
How do I make my applications work correctly given that tuples can be replayed?
The problem is how to ensure "fully Fault-Tolerant exactly-once messaging Semantics", because replay will cause a message to appear on Bolt multiple times, this will have a great impact on applications like count.
Starting from storm0.7, the transactional topologies function is provided to better solve this problem.
As always in software design, the answer is "It depends. "Storm 0.7.0 introduced the" transactional topologies "feature, which enables you to get fully Fault-Tolerant exactly-once messaging semantics for most computations. read more about transactional topologies here.
How does storm implement reliability in an efficient way?
We will discuss how storm implements the reliablility mechanism. Storm implements a special set of 'acker' tasks to track every spout tuple. You can also configure the number of Acker tasks according to the number of tuple.
A storm topology has a set of special "Acker" tasks that track the Dag of tuples for every spout tuple.
When an Acker sees that a DAG is complete, it sends a message to the spout task that created the spout tuple to ack the message.
You can set the number of Acker tasks for a topology in the topology configuration using config. topology_ackers. storm defaults topology_ackers to one task -- you will need to increase this number for topologies processing large amounts of messages.
All generated tuple will have a random 64-bit ID used by the track
Tuple forms a tuple tree through anchor during emit, and each tuple knows the ID of the spout tuple that generates it (through continuous copy transfer)
When any tuple is acked, the message is sent to the corresponding Acker. For example
When a tuple is created in a topology, whether in a spout or a bolt, it is given a random 64 bit ID. these IDS are used by ackers to track the tuple Dag for every spout tuple.
Every tuple knows the IDs of all the spout tuples for which it exists in their tuple trees. when you emit a new tuple in a bolt, the spout tuple IDs from the tuple's anchors are copied into the new tuple. when a tuple is acked, it sends a message to the appropriate Acker tasks with information about how the tuple tree changed. in particle it tells the Acker "I am now completed within the tree for this spout tuple, and here are the new tuples in the tree that were anchored to me ".
For example, if tuples "D" and "E" were created based on tuple "C", here's how the tuple tree changes when "C" is acked:
Of course, how does storm track all tuples through Acker tasks also needs to solve the following problems:
1. When there are multiple ackers, when a tuple is acked, If you know which Acker sends a message?
Because every tuple knows the spout tuple ID that generates it, use mod Hash (hash method, M mod N) to allocate the spout tuple ID, to ensure that all tuple trees generated by a spout tuple ID will be allocated to a Acker.
When a tuple is acked, you only need to use hash to find the corresponding Acker.
You can have an arbitrary number of Acker tasks in a topology. this leads to the following question: when a tuple is acked in the topology, how does it know to which Acker task to send that information? Storm uses mod hashing to map a spout tuple ID to an Acker task. since every tuple carries with it the spout tuple IDs of all the trees they exist within, they know which Acker tasks to communicate.
2. If there are multiple spout tasks, how does storm know which spout task corresponds to when the ACK spout tuple is final? Because the spout task that generates the tuple must be ack?
The answer is very simple. When a spout task is a new tuple in emit, a message is sent to the corresponding Acker's task id. Therefore, Acker knows the map of tupleid and taskid.
How the Acker tasks track which spout tasks are responsible for each spout tuple they're tracking?
When a spout task emits a new tuple, it simply sends a message to the appropriate Acker telling it that its task id is responsible for that spout tuple. then when an Acker sees a tree has been completed, it knows to which task id to send the completion message.
3. if Acker explicitly monitors all tuple trees in the memory, there will be expansion problems. When dealing with a large number of tuple or complex workflow, it is very likely that the memory will burst. How can this problem be solved?
Storm adopts a special method, which is one of the major breakthroughs of storm. The advantage of this method is that for every spout tuple, the memory required is fixed no matter how complicated it is, and only about 20 bytes
Acker only needs to store spout tuple IDs, task IDs, and ACK Val for each spout tuple.
This ack Val, 64 bit number, is used to indicate the status of the entire tuple tree. The generated method is that the IDs of all created and acked tuple in the tuple tree are different or (the same as 0, 1)
When the ACK Val value is 0, it indicates that the tuple tree is completed.
This idea is very clever. When two identical numbers are exclusive or 0, and created and acked are two exclusive or exclusive, so when all created tuple is acked, the exclusive OR value is eventually 0.
I have considered whether there will be interference when the bits of different tupleids overlap. Just give it a try without interference.
For details about how Acker works, refer to the Acker workflow of Twitter storm source code analysis.
Acker tasks do not track the tree of tuples explicitly. for large tuple trees with tens of thousands of nodes (or more), tracking all the tuple trees cocould overwhelm the memory used by the ackers. instead, the ackers take a different strategy that only requires a fixed amount of space per spout tuple (about 20 bytes ). this tracking algorithm is the key to how storm works and is one of its major breakthroughs. an Acker task stores a map from a spout tuple ID to a pair of values. the first value is the task id that created the spout tuple which is used later on to send completion messages. the second value is a 64-bit number called the "Ack Val ". the ack Val is a representation of the state of the entire tuple tree, no matter how big or how small. it is simply the XOR of all tuple IDs that have been created and/or acked in the tree. when an Acker task sees that an "Ack Val" has become 0, then it knows that the tuple tree is completed.
Finally, consider the case of task fail,
Generally, task fail causes timeout, and spout will replay
Acker task fail will cause all tuple it tracks to be unable to be ACK, so all timeouts will be resending by spout.
Spout task fail. If spout itself is fail, the source must be responsible for replay, such as rabbitmq or Kafka.
Now that you understand the reliability algorithm, let's go over all the failure cases and see how in each case storm avoids data loss:
- Task dies: In this case the spout tuple IDs at the root of the trees for the failed tuple will time out and be replayed.
- Acker task dies: In this case all the spout tuples the Acker was tracking will time out and be replayed.
- Spout task dies: In this case the source that the spout talks to is responsible for replaying the messages. For example, queues like Kestrel and rabbitmq will place all pending messages back on the queue when a client disconnects.
As you have seen, Storm's reliability mechanisms are completely distributed, scalable, and fault-tolerant.
Tuning Reliability
Of course, the reliability will inevitably bring a large overload to the system. For example, the number of messages will double because of the communication with the Acker.
If you do not need reliability, you can disable it using the following method:
Acker tasks are lightweight, so you don't need very weight of them in a topology. you can track their performance through the storm UI (Component ID "_ Acker "). if the throughput doesn't look right, you'll need to add more Acker tasks.
If reliability isn' t important to you -- that is, you don't care about losing tuples in failure situations -- then you can improve performance by not tracking the tuple tree for spout tuples. not tracking a tuple tree halves the number of messages transferred since normally there's an ACK message for every tuple in the tuple tree. additionally, it requires less IDs to be kept in each downstream tuple, cing bandwidth usage.
There are three ways to remove reliability.
1. The first is to setConfig. topology_ackers to 0. In this case, storm will callack
Method on the spout immediately after the spout emits a tuple. The tuple tree won't be tracked.
2. The second way isOmit a Message IDInSpoutOutputCollector.emit
Method.
3. Finally, emit themUnanchoredTuples