Storm miscellaneous-difference between field grouping and shuffle grouping

Source: Internet
Author: User
Reprinted please indicate the source: http://blog.csdn.net/luonanqin



When I recently studied stream grouping in Storm, I did not have a thorough understanding of field grouping and shuffle grouping. I didn't understand wordcounttopology very well. Then I started my mind and added a line of code to run it again. I can only say that I am not familiar with the basic concepts of storm. (For wordcounttopology examples, refer to storm-starter)

Public void execute (tuple, basicoutputcollector collector) {string word = tuple. getstring (0); // the function of adding this line of code is to see if the word with the same value is executed by the same instance. It is proved in real time that the system is true. out. println (This + "====" + word); integer COUNT = counts. get (Word); If (COUNT = NULL) Count = 0; count ++; counts. put (word, count); collector. emit (new values (word, count ));}
After repeated tests, the following is my personal summary. If there are any missing or errors, I will correct them in time.

The official document contains the following sentence: "If the stream is grouped by the" user-ID "field, tuples with the same" user-ID "will always go to the same task"

A task is an instance of processing logic. Therefore, fields can use the tuple Stream ID, that is, the xxx defined below.

public void declareOutputFields(OutputFieldsDeclarer declarer) {        declarer.declare(new Fields("xxx"));}
The specific content represented by XXX will be processed by a task and the content corresponding to the same XXX will be processed by the same task instance.

For example:

Bolt has three emit streams for the first time, namely, XXX has three luonq pangyang qinnl values. Assume that three Task instances are created for processing:

luonq -> instance1pangyang -> instance2qinnl -> instance3


Then, for the second time, four emit streams, that is, XXX has four values: luonq qinnanluo py pangyang. Assume that the preceding three Task instances are used for processing:
luonq -> instance1qinnanluo -> instance2py -> instance3pangyang -> instance2

Then, for the third time, there are two emit streams, namely, XXX has two values: py qinnl. Assume that the preceding three Task instances are used for processing:
py -> instance3qinnl -> instance3

Finally, let's take a look at the values processed by the three task instances and how many times they were processed:

Instance1: luonq (processed twice)
Instance2: pangyang (processed twice) qinnanluo (processed once)
Instane3: qinnl (processed twice) py (processed twice)

Conclusion:
1. The task instance that processes the value sent by Emit for the first time is random. After that, the task instance that initially processes the value is processed again until the topology ends.

2. A task instance can process values sent by multiple emit instances.

3. the difference from shuffle grouping is that when emit sends the same value, processing his tasks is random.

Storm miscellaneous-difference between field grouping and shuffle grouping

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.