Learn from me Storm Tutorial 2-parallel mechanism and data flow grouping

Last Update:2018-07-26 Source: Internet

Author: User

Tags ack emit prepare shuffle uuid

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original address: http://blog.csdn.net/hongkangwl/article/details/71103019 Please do not reprint four components of the topology Nodes (server)The supervisor in the storm cluster will perform part of the topology operation, and a storm cluster will typically have multiple node Workers (JVM virtual machine)Separate JVM processes running on node nodes, one or more workers can be run on each node. A complex topology is assigned to run on multiple workers. Executor (thread)Refers to a Java thread that runs in a JVM process. Multiple tasks can be assigned to the same executor run. Storm assigns a task to each executor by default. Task (Spout/bolt instance)The task is an instance of spout or bolt, and their nettuple () and execute () methods are called by the executor thread to execute Example

Builder.setspout (sentence_spout_id, SPOUT, 2);
        Sentencespout---Splitsentencebolt
        Builder.setbolt (split_bolt_id, Splitbolt, 2)
                . Setnumtasks (4)
                . shufflegrouping (sentence_spout_id);
        Splitsentencebolt--Wordcountbolt
        Builder.setbolt (count_bolt_id, Countbolt, 6)
                . Fieldsgrouping ( split_bolt_id, new fields ("word"));
        Wordcountbolt--Reportbolt
        Builder.setbolt (report_bolt_id, Reportbolt)
                . globalgrouping (COUNT_BOLT_ ID);
        Config conf = jstormhelper.getconfig (NULL);
        Conf.setnumworkers (2);

As the topology of the above configuration, its concurrency diagram is shown in the following figure:
There are 2 worker,10 executor, with rounded rectangles of executor, a total of 12 tasks (2 spout, 10 bolts)Data Flow groupingStream Grouping, tells topology how to send a tuple between two components. One step in defining a topology is to define what streams each bolt receives as input. Stream grouping is used to define a stream that should have 7 types of stream in it if the data is allocated to bolts above the tasks storm groupingShuffle GroupingRandomly distribute the tuple within the stream to ensure that each bolt task receives approximately the same number of tuples.Fields GroupingGroup By field, for example, by the field "User-id", then a tuple with the same "User-id" will be assigned to a task in the same bolt, while the different "User-id" may be allocated to different tasks.All GroupingBroadcast sent, to each tuple, all the bolts will receiveGlobal GroupingGlobal grouping, where the entire stream is assigned to one of the bolt's tasks in storm. More specifically, the task assigned to the lowest ID value.None GroupingNot grouped, this grouping means that stream doesn't care how to group. At present, this grouping and shuffle grouping is the same effect, a bit different is that the storm will use the none grouping the bolt into the same thread of the bolt's subscribers to execute (if possible).Direct GroupingPoint-type grouping, which is a special grouping method, which means that the sender of the message (tuple) specifies which task of the message receiver handles the message. Only message flows that are declared as Direct stream can declare this grouping method. And this message tuple must use the Emitdirect method to launch. The message processor can get the ID of the task that handles its message through Topologycontext (the Outputcollector.emit method also returns the ID of the task)Local or Shuffle groupingLocal or random grouping. If the target bolt has one or more tasks in the same worker process as the source bolt, the tuple will be randomly sent to tasks in the same process. Otherwise, it is consistent with normal shuffle grouping behavior.other messageIdHere the insert says Messageid,messageid can identify the only message, we can trace the processing of the message through MessageID and verify that the packet is in line with our expected wait. MessageID can be obtained by Tuple.getmessageid ().taskIdEach task pair in storm corresponds to a unique taskid, which can be obtained through Topologycontext.getthistaskid ().DemoWe trace the declaration cycle of a message through MessageID, as shown in the following figure.

The

can clearly see that a statement is received by Splitsentencebolt and cut into Tanjifa sent to Wordcountbolt,wordcountbolt received each single time after the calculation and then sent to Reportbolt for printing. Since the field after Splitsentencebolt split is passed to Wordcountbolt after FieldGroup, you can see from the following image that the same field is being sent to the same wordcountbolt one time. You can also switch to another single grep to see the results. Code Sentencespout

public class Sentencespout extends Baserichspout {private static final Logger Logger = Loggerfactory.getlogger (senten

    Cespout.class);
    Private Concurrenthashmap<uuid, values> pending;
    Private Spoutoutputcollector collector; Private string[] sentences = {"My dog has fleas", "I like cold beverages", "the Dog A
    Te my Homework "," Don t has a cow man "," I don t think I like Fleas "};

    Private Atomicinteger index = new Atomicinteger (0);

    Private Integer taskId = null;
    public void Declareoutputfields (Outputfieldsdeclarer declarer) {declarer.declare (The new fields ("sentence"));
        } public void open (Map config, topologycontext context, Spoutoutputcollector collector) {
        This.collector = collector;
        this.pending = new Concurrenthashmap<uuid, values> ();
    This.taskid = Context.getthistaskid (); } public void Nexttuple () {ValUES values = new values (Sentences[index.getandincrement ()));
        UUID msgId = Uuid.randomuuid ();
        This.pending.put (msgId, values);
        This.collector.emit (values, msgId);
        if (Index.get () >= sentences.length) {index = new Atomicinteger (0); } Logger.warn (String.Format ("Sentencespout with TaskId:%d emit msgId:%s and tuple is:%s", Taski
        D, MsgId, Jsonobject.tojson (values));
    Utils.waitformillis (100);
        } public void Ack (Object msgId) {this.pending.remove (msgId);
                Logger.warn (String.Format ("Sentencespout taskId:%d receive msgId:%s and remove it from the Pendingmap",
    TaskId, Jsonobject.tojsonstring (msgId)));  } public void fail (Object msgId) {logger.error (String.Format ("Sentencespout TaskID:%d receive msgId:%s and Remove it from the Pendingmap ", TaskId, Jsonobject.tojsonstriNg (MsgId)));
    This.collector.emit (This.pending.get (msgId), msgId);
 }
}

Splitsentencebolt

public class Splitsentencebolt extends Baserichbolt {private static final Logger Logger = Loggerfactory.getlogger (Sp
    Litsentencebolt.class);
    Private Outputcollector collector;

    Private Integer taskId = null;
        public void prepare (Map config, topologycontext context, Outputcollector collector) {this.collector = collector;
    This.taskid = Context.getthistaskid ();
        The public void execute (tuple tuple) {String sentence = Tuple.getstringbyfield ("sentence");
        string[] Words = Sentence.split ("");
        for (String word:words) {this.collector.emit (tuple, new Values (word));

        } this.collector.ack (tuple);
                Logger.warn (String.Format ("Splitsentencebolt taskid:%d acked tuple:%s and MessageId is:%s", TaskID,
    Jsonobject.tojsonstring (tuple, Serializerfeature.writemapnullvalue), Tuple.getmessageid ())); } public void Declareoutputfields (OutputfieldsdeclareR declarer) {Declarer.declare (New fields ("word"));
 }
}

Wordcountbolt

public class Wordcountbolt extends Baserichbolt {private static final Logger Logger = Loggerfactory.getlogger (Wordco

    Untbolt.class);
    Private Outputcollector collector;
    Private Hashmap<string, long> counts = null;

    Private Integer taskId = null; public void prepare (Map config, topologycontext context, Outputcollector collector) {This
        . Collector = Collector;
        this.counts = new hashmap<string, long> ();
    This.taskid = Context.getthistaskid ();
        } public void execute (tuple tuple) {String word = Tuple.getstringbyfield ("word");
        Long count = This.counts.get (word);
        if (count = = null) {count = 0L;
        } count++;
        This.counts.put (Word, count);
        This.collector.ack (tuple);
                Logger.warn (String.Format ("Wordcountbolt taskId:%d receive tuple:%s messageId is:%s and going to emit it", TaskId, Jsonobject.tojsonstring (tuple), Tuple.getmessageid ()));
    This.collector.emit (tuple, new Values (Word, count)); } public void Declareoutputfields (Outputfieldsdeclarer declarer) {declarer.declare (The new fields ("word", "count
    "));
 }
}

Wordcounttopology

public class Wordcounttopology {private static final String sentence_spout_id = "Sentence-spout";
    private static final String split_bolt_id = "Split-bolt";
    private static final String count_bolt_id = "Count-bolt";
    private static final String report_bolt_id = "Report-bolt";

    private static final String Topology_name = "Word-count-topology";
        public static void Main (string[] args) throws Exception {sentencespout spout = new Sentencespout ();
        Splitsentencebolt Splitbolt = new Splitsentencebolt ();
        Wordcountbolt Countbolt = new Wordcountbolt ();


        Reportbolt Reportbolt = new Reportbolt ();

        Topologybuilder builder = new Topologybuilder ();
        Builder.setspout (sentence_spout_id, SPOUT, 2); Sentencespout---Splitsentencebolt Builder.setbolt (split_bolt_id, Splitbolt, 2). Setnumtask
        S (4). shufflegrouping (sentence_spout_id);
 Splitsentencebolt-Wordcountbolt       Builder.setbolt (count_bolt_id, Countbolt, 6). fieldsgrouping (split_bolt_id, new fields ("word")); Wordcountbolt--Reportbolt Builder.setbolt (report_bolt_id, Reportbolt). Globalgroup


        ING (count_bolt_id);
        Config conf = jstormhelper.getconfig (NULL);
        Conf.setnumworkers (2);
        Conf.setdebug (TRUE);

        Boolean isLocal = true; Jstormhelper.runtopology (Builder.createtopology (), Topology_name, conf, new Jstormhelper.checkackedfai
    L (CONF), isLocal); }
}

GitHub Code address

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Learn from me Storm Tutorial 2-parallel mechanism and data flow grouping

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Learn from me Storm Tutorial 2-parallel mechanism and data flow grouping

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support