Getting Started with Storm learning-storm

Source: Internet
Author: User

Super Good information:

English: HTTPS://GITHUB.COM/XETORTHIO/GETTING-STARTED-WITH-STORM/BLOB/MASTER/CH03TOPOLOGIES.ASC

English: http://ifeve.com/getting-started-with-storm-3/

Here's a detailed example of several groupping strategies for storm:

Storm Grouping
    1. Shufflegrouping

      Defines a stream grouping as a mix. This mixed grouping means that input from spout will be mixed or distributed randomly to the tasks in this bolt. Shuffle grouping is more uniform on the tuple allocations for each task.

    2. Fieldsgrouping

      This grouping mechanism guarantees that a tuple of the same field value will go to the same task, which is critical for wordcount, and if the same word does not go to the same task, the number of words counted is wrong.

    3. All grouping

      Broadcast sent, for each tuple will be copied to each bolt processing.

    4. Global Grouping

      All the tuples in the stream are sent to the same bolt task processing, and all the tuple will be sent to the bolt task processing with the minimum task_id.

    5. None Grouping

      This approach is not concerned with the parallel processing of load balancing policies, which is now equivalent to shuffle grouping, and Storm will arrange the bolt task and his upstream data-providing task under the same thread.

    6. Direct Grouping

      A tuple's launch unit directly determines that a tuple will be emitted to that Bolt, which in general is determined by the bolt that receives the tuple to receive a tuple of which bolt is emitted. This is a very special grouping method, which means that the sender of the message specifies which task of the message receiver handles the message. Only message flows that are declared as direct stream can declare this grouping method. And this message tuple must use the Emitdirect method to launch. The message processor can get the taskid of the message that handles it by Topologycontext (the Outputcollector.emit method also returns TaskID)

In this chapter, we'll see how to pass tuples between the different components of a Storm topology, and what to de Ploy a topology into a running Storm cluster

Stream Grouping

One of the most important things so we need to does when designing a topology are to define what data is exchanged between C Omponents (how streams is consumed by the bolts). A Stream Grouping specifies which stream (s) is consumed by each and how the Stream would be bolt consumed.

Tip A node can emit more than one stream of data. A Stream grouping allows us to choose which stream to receive.

The stream grouping is set when the topology are defined, as we saw in Chapter 2, Getting Started:

....    Builder. Setbolt ("Word-normalizerWordnormalizer ()). Shufflegrouping (" Word-reader");  ....

Here a bolt are set on the Topology Builder, and then a source is set using the shuffle stream grouping. A stream grouping normally takes the source component ID as a parameter, and optionally other parameters as well, Dependin G on the kind of stream grouping.

Tip There can is more than one source per InputDeclarer , and each source can is grouped with a different stream grouping.
Shuffle Grouping

Shuffle Grouping is the most commonly used Grouping. It takes a single parameter (the source component), and sends each tuple, emitted by the source, to a randomly chosen bolt Warranting that each consumer would receive the same number of tuples.

The shuffle grouping is useful for doing atomic operations. For example, a math operation. However if the operation can ' t be randomically distributed, such as the example in Chapter 2 where we needed to count word s, we should considerate the use of other grouping.

Fields Grouping

Fields Grouping allows us to control how tuples is sent to bolts, based on one or more fields of the tuple. It guarantees that a given set of values, for a combination of fields, was always sent to the same bolt. Coming back to the word count example, if we group the stream by the  word  field, the  word-no Rmalizer  bolt would always send tuples with a given word to the same instance of the  word-counter bolt.

..... builder. Setbolt ("Word-counterwordcounter (),2). fieldsgrouping (" Word-normalizer  Fields ("word"));  ....
Tip All fields set in the field grouping must exist in the sources ' s fields declaration.
All Grouping

All Grouping sends a single copy of each of the instances of the receiving bolts. This kind of grouping are used to sendsignals to bolts, for example if we need to refresh a cache we can send a refresh Cache signal to all bolts. In the Word-count example, we could use a all grouping to add the ability to clear the counter cache (see topologies Exampl E

    Publicvoid Execute (tuple input) {string str = null; try{if (Input.getsourcestreamid ()  "Signals") {str = input< Span class= "Pl-k" >.getstringbyfield ( "Action"); if ( "Refreshcache" .equals (str)) Counters.clear ();}} catch (illegalargumentexception e) {//Do Nothing}  ...}               

We ' ve added a to if check the stream source. Storm give us the posibility to declare named streams (if we don't send a tuple to a named stream the stream is "default" ) it ' s an excelent-identify the source of the tuples like this case where we want to identify thesignals

In the topology definition, we add a second stream to the Word-counter bolt, which sends each tuple from the signals-spout s Tream to all instances of the bolt.

Builder. Setbolt ("Word-counterwordcounter (),2). fieldsgrouping (" Word-normalizer fields( " word"). Allgrouping ("Signals-spout","signals");       

The implementation of Signals-spout can is found at Git repository.

Custom Grouping

We can create our own custom stream grouping by implementing the backtype.storm.grouping.CustomStreamGrouping interface. This gives us the power to decide which Bolt (s) would receive each tuple.

Let's modify the word count example, to group tuples so it all words this start with the same letter would be received by The same bolt.

PublicClassModulegroupingImplementsCustomstreamgrouping,serializable{int Numtasks=0;@OverridePubliclist<Integer>Choosetasks (list<Object>Values) {list<Integer> Boltids=NewArrayList ();if (values. Size ()>0) {String Str = values.get (0) .toString (); if (Str.isempty ()) Boltids.add (0); else boltids.add (Str.charat (0) % numtasks); } return boltids;}  @Override public void prepare ( Span class= "Pl-stj" >topologycontext context, fields  Outfields, list<integer> targetTasks) {NumTasks = targettasks.size ();}}          

Here we can see a simple implementation CustomStreamGrouping of, where we use the amount of the modulus of the the integer Valu E of the first character of the word, thus selecting which Bolt would receive the tuple.

To use this grouping on our example we should change the word-normalizer grouping by the next:

       Builder. Setbolt ("Word-normalizerWordnormalizer ()). Customgrouping (" Word-reader  Modulegrouping ());    
Direct Grouping

This is a special grouping where the source decides which component would receive the tuple. Similarly to the previous example, the source would decide which Bolt receives the tuple based on the first letter of the W Ord. To use direct grouping, in the WordNormalizer bolts we use the emitDirect method instead of emit .

    Publicvoid Execute (Tuple input) {....ForString Wordif (!word.isempty ()) { .... Collector.emitdirect (Getwordcountindex (Word), new  Values (word)); }} //acknowledge the tuple collector.ack (input);} public integer getwordcountindex (String word) {Word Span class= "Pl-k" >= word.trim () .touppercase (); if (Word.isempty ()) return 0; else return word.charat (0) % numcountertasks;}            

We work out the number of target tasks in the prepare method:

    void Prepare (topologycontext context,            outputcollector collector) {        this this = Context. Getcomponenttasks ("Word-counter");}       

And in the topology definition, we specify that the stream would be grouped directly:

    Builder. Setbolt ("Word-counterwordcounter (),2). directgrouping (" Word-normalizer");  
Global Grouping

Global Grouping sends tuples generated by all instances of the source to a single target instance (specifically, the task with lowest ID).

None Grouping

At the time of writing (Storm version 0.7.1), using the grouping is the same as using Shuffle grouping. In the other words, when using the this grouping, we don ' t care how streams is grouped

Getting Started with Storm learning-storm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.