Storm miscellaneous-difference between field grouping and shuffle grouping

Last Update:2014-10-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Reprinted please indicate the source: http://blog.csdn.net/luonanqin

When I recently studied stream grouping in Storm, I did not have a thorough understanding of field grouping and shuffle grouping. I didn't understand wordcounttopology very well. Then I started my mind and added a line of code to run it again. I can only say that I am not familiar with the basic concepts of storm. (For wordcounttopology examples, refer to storm-starter)

Public void execute (tuple, basicoutputcollector collector) {string word = tuple. getstring (0); // the function of adding this line of code is to see if the word with the same value is executed by the same instance. It is proved in real time that the system is true. out. println (This + "====" + word); integer COUNT = counts. get (Word); If (COUNT = NULL) Count = 0; count ++; counts. put (word, count); collector. emit (new values (word, count ));}

After repeated tests, the following is my personal summary. If there are any missing or errors, I will correct them in time.

The official document contains the following sentence: "If the stream is grouped by the" user-ID "field, tuples with the same" user-ID "will always go to the same task"

A task is an instance of processing logic. Therefore, fields can use the tuple Stream ID, that is, the xxx defined below.

public void declareOutputFields(OutputFieldsDeclarer declarer) {        declarer.declare(new Fields("xxx"));}

The specific content represented by XXX will be processed by a task and the content corresponding to the same XXX will be processed by the same task instance.

For example:

Bolt has three emit streams for the first time, namely, XXX has three luonq pangyang qinnl values. Assume that three Task instances are created for processing:

luonq -> instance1pangyang -> instance2qinnl -> instance3

Then, for the second time, four emit streams, that is, XXX has four values: luonq qinnanluo py pangyang. Assume that the preceding three Task instances are used for processing:

luonq -> instance1qinnanluo -> instance2py -> instance3pangyang -> instance2

Then, for the third time, there are two emit streams, namely, XXX has two values: py qinnl. Assume that the preceding three Task instances are used for processing:

py -> instance3qinnl -> instance3

Finally, let's take a look at the values processed by the three task instances and how many times they were processed:

Instance1: luonq (processed twice)
Instance2: pangyang (processed twice) qinnanluo (processed once)
Instane3: qinnl (processed twice) py (processed twice)

Conclusion:
1. The task instance that processes the value sent by Emit for the first time is random. After that, the task instance that initially processes the value is processed again until the topology ends.

2. A task instance can process values sent by multiple emit instances.

3. the difference from shuffle grouping is that when emit sends the same value, processing his tasks is random.

Storm miscellaneous-difference between field grouping and shuffle grouping

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Storm miscellaneous-difference between field grouping and shuffle grouping

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support