Learn from me Storm Tutorial 2-parallel mechanism and data flow grouping

Source: Internet
Author: User
Tags ack emit prepare shuffle uuid
Original address: http://blog.csdn.net/hongkangwl/article/details/71103019 Please do not reprint four components of the topology Nodes (server)The supervisor in the storm cluster will perform part of the topology operation, and a storm cluster will typically have multiple node Workers (JVM virtual machine)Separate JVM processes running on node nodes, one or more workers can be run on each node. A complex topology is assigned to run on multiple workers. Executor (thread)Refers to a Java thread that runs in a JVM process. Multiple tasks can be assigned to the same executor run. Storm assigns a task to each executor by default. Task (Spout/bolt instance)The task is an instance of spout or bolt, and their nettuple () and execute () methods are called by the executor thread to execute Example
Builder.setspout (sentence_spout_id, SPOUT, 2);
        Sentencespout---Splitsentencebolt
        Builder.setbolt (split_bolt_id, Splitbolt, 2)
                . Setnumtasks (4)
                . shufflegrouping (sentence_spout_id);
        Splitsentencebolt--Wordcountbolt
        Builder.setbolt (count_bolt_id, Countbolt, 6)
                . Fieldsgrouping ( split_bolt_id, new fields ("word"));
        Wordcountbolt--Reportbolt
        Builder.setbolt (report_bolt_id, Reportbolt)
                . globalgrouping (COUNT_BOLT_ ID);
        Config conf = jstormhelper.getconfig (NULL);
        Conf.setnumworkers (2);

As the topology of the above configuration, its concurrency diagram is shown in the following figure:
There are 2 worker,10 executor, with rounded rectangles of executor, a total of 12 tasks (2 spout, 10 bolts)Data Flow groupingStream Grouping, tells topology how to send a tuple between two components. One step in defining a topology is to define what streams each bolt receives as input. Stream grouping is used to define a stream that should have 7 types of stream in it if the data is allocated to bolts above the tasks storm groupingShuffle GroupingRandomly distribute the tuple within the stream to ensure that each bolt task receives approximately the same number of tuples.Fields GroupingGroup By field, for example, by the field "User-id", then a tuple with the same "User-id" will be assigned to a task in the same bolt, while the different "User-id" may be allocated to different tasks.All GroupingBroadcast sent, to each tuple, all the bolts will receiveGlobal GroupingGlobal grouping, where the entire stream is assigned to one of the bolt's tasks in storm. More specifically, the task assigned to the lowest ID value.None GroupingNot grouped, this grouping means that stream doesn't care how to group. At present, this grouping and shuffle grouping is the same effect, a bit different is that the storm will use the none grouping the bolt into the same thread of the bolt's subscribers to execute (if possible).Direct GroupingPoint-type grouping, which is a special grouping method, which means that the sender of the message (tuple) specifies which task of the message receiver handles the message. Only message flows that are declared as Direct stream can declare this grouping method. And this message tuple must use the Emitdirect method to launch. The message processor can get the ID of the task that handles its message through Topologycontext (the Outputcollector.emit method also returns the ID of the task)Local or Shuffle groupingLocal or random grouping. If the target bolt has one or more tasks in the same worker process as the source bolt, the tuple will be randomly sent to tasks in the same process. Otherwise, it is consistent with normal shuffle grouping behavior.other messageIdHere the insert says Messageid,messageid can identify the only message, we can trace the processing of the message through MessageID and verify that the packet is in line with our expected wait. MessageID can be obtained by Tuple.getmessageid ().taskIdEach task pair in storm corresponds to a unique taskid, which can be obtained through Topologycontext.getthistaskid ().DemoWe trace the declaration cycle of a message through MessageID, as shown in the following figure.

The

can clearly see that a statement is received by Splitsentencebolt and cut into Tanjifa sent to Wordcountbolt,wordcountbolt received each single time after the calculation and then sent to Reportbolt for printing. Since the field after Splitsentencebolt split is passed to Wordcountbolt after FieldGroup, you can see from the following image that the same field is being sent to the same wordcountbolt one time. You can also switch to another single grep to see the results. Code Sentencespout

public class Sentencespout extends Baserichspout {private static final Logger Logger = Loggerfactory.getlogger (senten

    Cespout.class);
    Private Concurrenthashmap<uuid, values> pending;
    Private Spoutoutputcollector collector; Private string[] sentences = {"My dog has fleas", "I like cold beverages", "the Dog A
    Te my Homework "," Don t has a cow man "," I don t think I like Fleas "};

    Private Atomicinteger index = new Atomicinteger (0);

    Private Integer taskId = null;
    public void Declareoutputfields (Outputfieldsdeclarer declarer) {declarer.declare (The new fields ("sentence"));
        } public void open (Map config, topologycontext context, Spoutoutputcollector collector) {
        This.collector = collector;
        this.pending = new Concurrenthashmap<uuid, values> ();
    This.taskid = Context.getthistaskid (); } public void Nexttuple () {ValUES values = new values (Sentences[index.getandincrement ()));
        UUID msgId = Uuid.randomuuid ();
        This.pending.put (msgId, values);
        This.collector.emit (values, msgId);
        if (Index.get () >= sentences.length) {index = new Atomicinteger (0); } Logger.warn (String.Format ("Sentencespout with TaskId:%d emit msgId:%s and tuple is:%s", Taski
        D, MsgId, Jsonobject.tojson (values));
    Utils.waitformillis (100);
        } public void Ack (Object msgId) {this.pending.remove (msgId);
                Logger.warn (String.Format ("Sentencespout taskId:%d receive msgId:%s and remove it from the Pendingmap",
    TaskId, Jsonobject.tojsonstring (msgId)));  } public void fail (Object msgId) {logger.error (String.Format ("Sentencespout TaskID:%d receive msgId:%s and Remove it from the Pendingmap ", TaskId, Jsonobject.tojsonstriNg (MsgId)));
    This.collector.emit (This.pending.get (msgId), msgId);
 }
}
Splitsentencebolt
public class Splitsentencebolt extends Baserichbolt {private static final Logger Logger = Loggerfactory.getlogger (Sp
    Litsentencebolt.class);
    Private Outputcollector collector;

    Private Integer taskId = null;
        public void prepare (Map config, topologycontext context, Outputcollector collector) {this.collector = collector;
    This.taskid = Context.getthistaskid ();
        The public void execute (tuple tuple) {String sentence = Tuple.getstringbyfield ("sentence");
        string[] Words = Sentence.split ("");
        for (String word:words) {this.collector.emit (tuple, new Values (word));

        } this.collector.ack (tuple);
                Logger.warn (String.Format ("Splitsentencebolt taskid:%d acked tuple:%s and MessageId is:%s", TaskID,
    Jsonobject.tojsonstring (tuple, Serializerfeature.writemapnullvalue), Tuple.getmessageid ())); } public void Declareoutputfields (OutputfieldsdeclareR declarer) {Declarer.declare (New fields ("word"));
 }
}
Wordcountbolt
public class Wordcountbolt extends Baserichbolt {private static final Logger Logger = Loggerfactory.getlogger (Wordco

    Untbolt.class);
    Private Outputcollector collector;
    Private Hashmap<string, long> counts = null;

    Private Integer taskId = null; public void prepare (Map config, topologycontext context, Outputcollector collector) {This
        . Collector = Collector;
        this.counts = new hashmap<string, long> ();
    This.taskid = Context.getthistaskid ();
        } public void execute (tuple tuple) {String word = Tuple.getstringbyfield ("word");
        Long count = This.counts.get (word);
        if (count = = null) {count = 0L;
        } count++;
        This.counts.put (Word, count);
        This.collector.ack (tuple);
                Logger.warn (String.Format ("Wordcountbolt taskId:%d receive tuple:%s messageId is:%s and going to emit it", TaskId, Jsonobject.tojsonstring (tuple), Tuple.getmessageid ()));
    This.collector.emit (tuple, new Values (Word, count)); } public void Declareoutputfields (Outputfieldsdeclarer declarer) {declarer.declare (The new fields ("word", "count
    "));
 }
}
Wordcounttopology
public class Wordcounttopology {private static final String sentence_spout_id = "Sentence-spout";
    private static final String split_bolt_id = "Split-bolt";
    private static final String count_bolt_id = "Count-bolt";
    private static final String report_bolt_id = "Report-bolt";

    private static final String Topology_name = "Word-count-topology";
        public static void Main (string[] args) throws Exception {sentencespout spout = new Sentencespout ();
        Splitsentencebolt Splitbolt = new Splitsentencebolt ();
        Wordcountbolt Countbolt = new Wordcountbolt ();


        Reportbolt Reportbolt = new Reportbolt ();

        Topologybuilder builder = new Topologybuilder ();
        Builder.setspout (sentence_spout_id, SPOUT, 2); Sentencespout---Splitsentencebolt Builder.setbolt (split_bolt_id, Splitbolt, 2). Setnumtask
        S (4). shufflegrouping (sentence_spout_id);
 Splitsentencebolt-Wordcountbolt       Builder.setbolt (count_bolt_id, Countbolt, 6). fieldsgrouping (split_bolt_id, new fields ("word")); Wordcountbolt--Reportbolt Builder.setbolt (report_bolt_id, Reportbolt). Globalgroup


        ING (count_bolt_id);
        Config conf = jstormhelper.getconfig (NULL);
        Conf.setnumworkers (2);
        Conf.setdebug (TRUE);

        Boolean isLocal = true; Jstormhelper.runtopology (Builder.createtopology (), Topology_name, conf, new Jstormhelper.checkackedfai
    L (CONF), isLocal); }
}

GitHub Code address

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.