Super Good information:
English: HTTPS://GITHUB.COM/XETORTHIO/GETTING-STARTED-WITH-STORM/BLOB/MASTER/CH03TOPOLOGIES.ASC
English: http://ifeve.com/getting-started-with-storm-3/
Here's a detailed example of several groupping strategies for storm:
Storm Grouping
Shufflegrouping
Defines a stream grouping as a mix. This mixed grouping means that input from spout will be mixed or distributed randomly to the tasks in this bolt. Shuffle grouping is more uniform on the tuple allocations for each task.
Fieldsgrouping
This grouping mechanism guarantees that a tuple of the same field value will go to the same task, which is critical for wordcount, and if the same word does not go to the same task, the number of words counted is wrong.
All grouping
Broadcast sent, for each tuple will be copied to each bolt processing.
Global Grouping
All the tuples in the stream are sent to the same bolt task processing, and all the tuple will be sent to the bolt task processing with the minimum task_id.
None Grouping
This approach is not concerned with the parallel processing of load balancing policies, which is now equivalent to shuffle grouping, and Storm will arrange the bolt task and his upstream data-providing task under the same thread.
Direct Grouping
A tuple's launch unit directly determines that a tuple will be emitted to that Bolt, which in general is determined by the bolt that receives the tuple to receive a tuple of which bolt is emitted. This is a very special grouping method, which means that the sender of the message specifies which task of the message receiver handles the message. Only message flows that are declared as direct stream can declare this grouping method. And this message tuple must use the Emitdirect method to launch. The message processor can get the taskid of the message that handles it by Topologycontext (the Outputcollector.emit method also returns TaskID)
In this chapter, we'll see how to pass tuples between the different components of a Storm topology, and what to de Ploy a topology into a running Storm cluster
Stream Grouping
One of the most important things so we need to does when designing a topology are to define what data is exchanged between C Omponents (how streams is consumed by the bolts). A Stream Grouping specifies which stream (s) is consumed by each and how the Stream would be bolt consumed.
| Tip |
A node can emit more than one stream of data. A Stream grouping allows us to choose which stream to receive. |
The stream grouping is set when the topology are defined, as we saw in Chapter 2, Getting Started:
.... Builder. Setbolt ("Word-normalizerWordnormalizer ()). Shufflegrouping (" Word-reader"); ....
Here a bolt are set on the Topology Builder, and then a source is set using the shuffle stream grouping. A stream grouping normally takes the source component ID as a parameter, and optionally other parameters as well, Dependin G on the kind of stream grouping.
| Tip |
There can is more than one source per InputDeclarer , and each source can is grouped with a different stream grouping. |
Shuffle Grouping
Shuffle Grouping is the most commonly used Grouping. It takes a single parameter (the source component), and sends each tuple, emitted by the source, to a randomly chosen bolt Warranting that each consumer would receive the same number of tuples.
The shuffle grouping is useful for doing atomic operations. For example, a math operation. However if the operation can ' t be randomically distributed, such as the example in Chapter 2 where we needed to count word s, we should considerate the use of other grouping.
Fields Grouping
Fields Grouping allows us to control how tuples is sent to bolts, based on one or more fields of the tuple. It guarantees that a given set of values, for a combination of fields, was always sent to the same bolt. Coming back to the word count example, if we group the stream by the word field, the word-no Rmalizer bolt would always send tuples with a given word to the same instance of the word-counter bolt.
..... builder. Setbolt ("Word-counterwordcounter (),2). fieldsgrouping (" Word-normalizer Fields ("word")); ....
| Tip |
All fields set in the field grouping must exist in the sources ' s fields declaration. |
All Grouping
All Grouping sends a single copy of each of the instances of the receiving bolts. This kind of grouping are used to sendsignals to bolts, for example if we need to refresh a cache we can send a refresh Cache signal to all bolts. In the Word-count example, we could use a all grouping to add the ability to clear the counter cache (see topologies Exampl E
Publicvoid Execute (tuple input) {string str = null; try{if (Input.getsourcestreamid () "Signals") {str = input< Span class= "Pl-k" >.getstringbyfield ( "Action"); if ( "Refreshcache" .equals (str)) Counters.clear ();}} catch (illegalargumentexception e) {//Do Nothing} ...}
We ' ve added a to if check the stream source. Storm give us the posibility to declare named streams (if we don't send a tuple to a named stream the stream is "default" ) it ' s an excelent-identify the source of the tuples like this case where we want to identify thesignals
In the topology definition, we add a second stream to the Word-counter bolt, which sends each tuple from the signals-spout s Tream to all instances of the bolt.
Builder. Setbolt ("Word-counterwordcounter (),2). fieldsgrouping (" Word-normalizer fields( " word"). Allgrouping ("Signals-spout","signals");
The implementation of Signals-spout can is found at Git repository.
Custom Grouping
We can create our own custom stream grouping by implementing the backtype.storm.grouping.CustomStreamGrouping interface. This gives us the power to decide which Bolt (s) would receive each tuple.
Let's modify the word count example, to group tuples so it all words this start with the same letter would be received by The same bolt.
PublicClassModulegroupingImplementsCustomstreamgrouping,serializable{int Numtasks=0;@OverridePubliclist<Integer>Choosetasks (list<Object>Values) {list<Integer> Boltids=NewArrayList ();if (values. Size ()>0) {String Str = values.get (0) .toString (); if (Str.isempty ()) Boltids.add (0); else boltids.add (Str.charat (0) % numtasks); } return boltids;} @Override public void prepare ( Span class= "Pl-stj" >topologycontext context, fields Outfields, list<integer> targetTasks) {NumTasks = targettasks.size ();}}
Here we can see a simple implementation CustomStreamGrouping of, where we use the amount of the modulus of the the integer Valu E of the first character of the word, thus selecting which Bolt would receive the tuple.
To use this grouping on our example we should change the word-normalizer grouping by the next:
Builder. Setbolt ("Word-normalizerWordnormalizer ()). Customgrouping (" Word-reader Modulegrouping ());
Direct Grouping
This is a special grouping where the source decides which component would receive the tuple. Similarly to the previous example, the source would decide which Bolt receives the tuple based on the first letter of the W Ord. To use direct grouping, in the WordNormalizer bolts we use the emitDirect method instead of emit .
Publicvoid Execute (Tuple input) {....ForString Wordif (!word.isempty ()) { .... Collector.emitdirect (Getwordcountindex (Word), new Values (word)); }} //acknowledge the tuple collector.ack (input);} public integer getwordcountindex (String word) {Word Span class= "Pl-k" >= word.trim () .touppercase (); if (Word.isempty ()) return 0; else return word.charat (0) % numcountertasks;}
We work out the number of target tasks in the prepare method:
void Prepare (topologycontext context, outputcollector collector) { this this = Context. Getcomponenttasks ("Word-counter");}
And in the topology definition, we specify that the stream would be grouped directly:
Builder. Setbolt ("Word-counterwordcounter (),2). directgrouping (" Word-normalizer");
Global Grouping
Global Grouping sends tuples generated by all instances of the source to a single target instance (specifically, the task with lowest ID).
None Grouping
At the time of writing (Storm version 0.7.1), using the grouping is the same as using Shuffle grouping. In the other words, when using the this grouping, we don ' t care how streams is grouped
Getting Started with Storm learning-storm