Stream Grouping concurrency Policy

Source: Internet
Author: User

Spout Data Source: Messages Queue Message Queuing must use

MQ, Db, file

Direct stream Data Source: MQ

Read-only configuration file from db

Log file increment data: 1, read content write mq,2, storm processing

spout reading files: Learning with, other useless

Read the file: 1, the distributed application can not read, 2, spout open concurrency will repeat read

Stream Grouping Policy only applies to multiple concurrency

Stream grouping is used to define a stream that should be assigned to bolts above the multiple

Executors ( Multi-threading, concurrency )

Note: not a spout or bolt emit to multiple bolts (broadcast mode).

There are 6 types of stream grouping inside storm.

the single thread is equal to All Grouping ( get all the data )



1.Shuffle Grouping

Poll, average distribution. Randomly distribute the tuple within the stream to ensure that each bolt receives the same number of tuples.



Average distribution:

Results:

Two threads distribute data evenly



2. Non Grouping: No grouping, this grouping and shuffle Grouping is the same effect, multi-threaded under not evenly distributed.

Results









3. Fields Grouping: Grouped by field, such as by word, a tuple with the same word will be divided into the same bolts, and different word will be assigned to different bolts.

Function: 1, filter, select some fields from the source (spout or up-level bolt) multi-output field

2, the same tuple will be distributed to the same Executer or task processing

Typical scenario: de-re-operation, Join

Join requires two sources and the data source must be synchronized in a timely manner. Otherwise error prone





Shufflegrouping

=============================================ab C D

=============================================BC d E F

=============================================ad E F

Thread-16-count------------------------Word=c; Count=1

Thread-18-count------------------------Word=a; Count=1

Thread-18-count------------------------Word=d; Count=1

Thread-16-count------------------------Word=d; Count=1

Thread-16-count------------------------word=f; Count=1

Thread-18-count------------------------word=e; Count=1

Thread-16-count------------------------Word=c; count=2

Thread-18-count------------------------Word=d; count=2

Thread-18-count------------------------word=e; count=2

Thread-20-count------------------------word=b; Count=1

Thread-20-count------------------------Word=a; Count=1

Thread-20-count------------------------word=b; count=2

Thread-20-count------------------------word=f; Count=1



You can see that threads are irregular.

If the 18 thread manages the D 16 thread also manages the D so the sorting result is problematic

Fields Grouping:



=============================================ab C D

=============================================BC d E F

=============================================ad E F

Thread-20-count------------------------Word=a; Count=1

Thread-20-count------------------------Word=d; Count=1

Thread-16-count------------------------word=b; Count=1

Thread-16-count------------------------word=b; count=2

Thread-18-count------------------------Word=c; Count=1

Thread-16-count------------------------word=e; Count=1

Thread-16-count------------------------word=e; count=2

Thread-20-count------------------------Word=a; count=2

Thread-20-count------------------------Word=d; count=2

Thread-18-count------------------------Word=c; count=2

Thread-20-count------------------------Word=d; Count=3

Thread-18-count------------------------word=f; Count=1

Thread-18-count------------------------word=f; count=2



The same word that the respective thread is divided into

16.b,e

18.c,f

20.a,d

That is, a tuple with the same word will be divided into the same bolts,









4. All Grouping: Broadcast sent, for each tuple, all bolts will receive.



Results:

5. Global Grouping: Globally grouped, this tuple is assigned to one of the bolt's tasks in storm. More specifically, the task assigned to the lowest ID value. Fit the scene: can't imagine.

Only one task is assigned to it, and the other receives no data



Only one thread receives a tuple, none of the other outputs, and the thread's smallest task receives the



6. Direct Grouping: A straightforward grouping, which is a more specific grouping method, which means that the sender of the message decides which task the message receiver is to process the message. Only message flows that are declared as direct stream can declare this grouping method. And this message tuple must use the Emitdirect method to launch. A message processor can topologycontext or process its message TaskID (Outputcollector.emit method also returns TaskID)





Degree of concurrency

Scenario Analysis:

single thread under: subtraction, and any processing class Operate , Summary

Multithreading under:

1 , local Subtraction

2 , do processing class Operate , such as Split

3 , persistence, such as entering DB

in Wordcounttopology.java as an example to explain

Study questions: How to calculate: Word Total and Word number? and completed in high concurrency

The former is the total number of rows, the latter is the weight Word number

Similar enterprise scenarios: Computing Web sites PV and the UV





degree of parallelism and concurrency



Read the file source, opened two threads read the file, so read two copies of data.

1. such that the distributed source cannot read

2.spout read repeatedly on/off concurrently

if Spout is a message queue, consumption of data will not be consumed, in any case will not be duplicated data



and Bolt words, there 6 species Group policies, using Shuffle Grouping The data is obtained on average using polling methods . .

a Spout to multiple Blot that's the way it's broadcast .



Stream Grouping concurrency Policy

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.