I have some questions about this. I have carefully reviewed it and recorded it.
The prerequisite for enabling the storm tracker mechanism is,
1. When spout emit tuple is added, the 3rd parameter messageid must be added.
2. In the configuration, the number of ackers must be at least 1.
3. Add the second anchor tuple parameter to bolt emit to maintain the tracker link.
Process,
1. When tuple has a messageid, spout adds the tuple to the pending list.
Send a message to the Acker to notify the Acker to start tracker.
2. Then, in the subsequent bolt processing logic, you must explicitly ACK or fail all processed tuple.
If the tuple is successfully executed in the entire Dag, Acker will find that the track of the tuple is different or the value is 0.
Therefore, Acker sends ack_message to spout.
Of course, if fail is on any node bolt on the Dag, Acker will think that the tuple fail
Therefore, Acker sends a fail_message to spout.
3. How to Handle ACK or fail message received by spout,
The first step is to delete the tuple from the pending list, because no matter ACK or fail, the tuple is not necessary to continue to be cached as long as the result is obtained.
Then, call spout. Ack or spout. Fail.
Therefore, the system does not do anything by default, or even re-transmission after fail, you also need to implement it in fail.
How to implement it
4. If a tuple is not ACK or fail, it will eventually time out.
Spout goes to rotate pending list based on system tick. For every outdated tuple, spout. Fail is called.
The following question is how to repeat fail,
The user must handle fail by himself, and the system will not do it by himself,
public void fail(Object msgId)
Look at the interface provided by the system. Only the msgid parameter is used. The design here is unreasonable. In fact, the entire MSG is cached in the system, and only one messageid is provided to the user. How can the user obtain the original msg?
It seems that you need to cache it yourself and use this msgid to query it.
Alibaba's jstorm will provide
public interface IFailValueSpout { void fail(Object msgId, List<object>values); }
This is more reasonable, and the MSG values of the system cache can be directly obtained.