Reprinted please indicate the source: http://blog.csdn.net/luonanqin
When I recently studied stream grouping in Storm, I did not have a thorough understanding of field grouping and shuffle grouping. I didn't understand wordcounttopology very well. Then I started my mind and added a line of code to run it again. I can only say that I am not familiar with the basic concepts of storm. (For wordcounttopology examples, refer to storm-starter)
Public void execute (tuple, basicoutputcollector collector) {string word = tuple. getstring (0); // the function of adding this line of code is to see if the word with the same value is executed by the same instance. It is proved in real time that the system is true. out. println (This + "====" + word); integer COUNT = counts. get (Word); If (COUNT = NULL) Count = 0; count ++; counts. put (word, count); collector. emit (new values (word, count ));}After repeated tests, the following is my personal summary. If there are any missing or errors, I will correct them in time.
The official document contains the following sentence: "If the stream is grouped by the" user-ID "field, tuples with the same" user-ID "will always go to the same task"
A task is an instance of processing logic. Therefore, fields can use the tuple Stream ID, that is, the xxx defined below.
public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("xxx"));}The specific content represented by XXX will be processed by a task and the content corresponding to the same XXX will be processed by the same task instance.
For example:
Bolt has three emit streams for the first time, namely, XXX has three luonq pangyang qinnl values. Assume that three Task instances are created for processing:
luonq -> instance1pangyang -> instance2qinnl -> instance3
Then, for the second time, four emit streams, that is, XXX has four values: luonq qinnanluo py pangyang. Assume that the preceding three Task instances are used for processing:
luonq -> instance1qinnanluo -> instance2py -> instance3pangyang -> instance2
Then, for the third time, there are two emit streams, namely, XXX has two values: py qinnl. Assume that the preceding three Task instances are used for processing:
py -> instance3qinnl -> instance3
Finally, let's take a look at the values processed by the three task instances and how many times they were processed:
Instance1: luonq (processed twice)
Instance2: pangyang (processed twice) qinnanluo (processed once)
Instane3: qinnl (processed twice) py (processed twice)
Conclusion:
1. The task instance that processes the value sent by Emit for the first time is random. After that, the task instance that initially processes the value is processed again until the topology ends.
2. A task instance can process values sent by multiple emit instances.
3. the difference from shuffle grouping is that when emit sends the same value, processing his tasks is random.
Storm miscellaneous-difference between field grouping and shuffle grouping