Spout Data Source: Messages Queue Message Queuing must use
MQ, Db, file
Direct stream Data Source: MQ
Read-only configuration file from db
Log file increment data: 1, read content write mq,2, storm processing
spout reading files: Learning with, other useless
Read the file: 1, the distributed application can not read, 2, spout open concurrency will repeat read
Stream Grouping Policy only applies to multiple concurrency
Stream grouping is used to define a stream that should be assigned to bolts above the multiple
Executors ( Multi-threading, concurrency )
Note: not a spout or bolt emit to multiple bolts (broadcast mode).
There are 6 types of stream grouping inside storm.
the single thread is equal to All Grouping ( get all the data )
1.Shuffle Grouping
Poll, average distribution. Randomly distribute the tuple within the stream to ensure that each bolt receives the same number of tuples.
Average distribution:
Results:
Two threads distribute data evenly
2. Non Grouping: No grouping, this grouping and shuffle Grouping is the same effect, multi-threaded under not evenly distributed.
Results
3. Fields Grouping: Grouped by field, such as by word, a tuple with the same word will be divided into the same bolts, and different word will be assigned to different bolts.
Function: 1, filter, select some fields from the source (spout or up-level bolt) multi-output field
2, the same tuple will be distributed to the same Executer or task processing
Typical scenario: de-re-operation, Join
Join requires two sources and the data source must be synchronized in a timely manner. Otherwise error prone
Shufflegrouping
=============================================ab C D
=============================================BC d E F
=============================================ad E F
Thread-16-count------------------------Word=c; Count=1
Thread-18-count------------------------Word=a; Count=1
Thread-18-count------------------------Word=d; Count=1
Thread-16-count------------------------Word=d; Count=1
Thread-16-count------------------------word=f; Count=1
Thread-18-count------------------------word=e; Count=1
Thread-16-count------------------------Word=c; count=2
Thread-18-count------------------------Word=d; count=2
Thread-18-count------------------------word=e; count=2
Thread-20-count------------------------word=b; Count=1
Thread-20-count------------------------Word=a; Count=1
Thread-20-count------------------------word=b; count=2
Thread-20-count------------------------word=f; Count=1
You can see that threads are irregular.
If the 18 thread manages the D 16 thread also manages the D so the sorting result is problematic
Fields Grouping:
=============================================ab C D
=============================================BC d E F
=============================================ad E F
Thread-20-count------------------------Word=a; Count=1
Thread-20-count------------------------Word=d; Count=1
Thread-16-count------------------------word=b; Count=1
Thread-16-count------------------------word=b; count=2
Thread-18-count------------------------Word=c; Count=1
Thread-16-count------------------------word=e; Count=1
Thread-16-count------------------------word=e; count=2
Thread-20-count------------------------Word=a; count=2
Thread-20-count------------------------Word=d; count=2
Thread-18-count------------------------Word=c; count=2
Thread-20-count------------------------Word=d; Count=3
Thread-18-count------------------------word=f; Count=1
Thread-18-count------------------------word=f; count=2
The same word that the respective thread is divided into
16.b,e
18.c,f
20.a,d
That is, a tuple with the same word will be divided into the same bolts,
4. All Grouping: Broadcast sent, for each tuple, all bolts will receive.
Results:
5. Global Grouping: Globally grouped, this tuple is assigned to one of the bolt's tasks in storm. More specifically, the task assigned to the lowest ID value. Fit the scene: can't imagine.
Only one task is assigned to it, and the other receives no data
Only one thread receives a tuple, none of the other outputs, and the thread's smallest task receives the
6. Direct Grouping: A straightforward grouping, which is a more specific grouping method, which means that the sender of the message decides which task the message receiver is to process the message. Only message flows that are declared as direct stream can declare this grouping method. And this message tuple must use the Emitdirect method to launch. A message processor can topologycontext or process its message TaskID (Outputcollector.emit method also returns TaskID)
Degree of concurrency
Scenario Analysis:
single thread under: subtraction, and any processing class Operate , Summary
Multithreading under:
1 , local Subtraction
2 , do processing class Operate , such as Split
3 , persistence, such as entering DB
in Wordcounttopology.java as an example to explain
Study questions: How to calculate: Word Total and Word number? and completed in high concurrency
The former is the total number of rows, the latter is the weight Word number
Similar enterprise scenarios: Computing Web sites PV and the UV
degree of parallelism and concurrency
Read the file source, opened two threads read the file, so read two copies of data.
1. such that the distributed source cannot read
2.spout read repeatedly on/off concurrently
if Spout is a message queue, consumption of data will not be consumed, in any case will not be duplicated data
and Bolt words, there 6 species Group policies, using Shuffle Grouping The data is obtained on average using polling methods . .
a Spout to multiple Blot that's the way it's broadcast .
Stream Grouping concurrency Policy