Storm (vi): Streaming and merging of data streams

Source: Internet
Author: User
Tags config emit sleep

Storm to data processing, different data to different bolts to deal with, and then processed data to the same bolt to store to the database, then need to shunt and confluence, we use an example to understand the shunt and confluence.

We read the text through spout, then send to the first bolt to cut the text, if it is a space to send bolt (1), if the comma is composed of text sent to Bolt (2), that is, the shunt, Then the same word is sent to the second bolt for the same task to be counted (confluence) when the word is cut, and these processes can be used by multiple servers to help us complete.

1. Diversion

1) first defined by Declareoutputfields in the main bolt

        @Override public
	void Declareoutputfields (Outputfieldsdeclarer declarer) {
		Declarer.declarestream (" StreamId1 ", New Fields (" field "));
		Declarer.declarestream ("StreamId2", New Fields ("field"));
	
2) then specify the send data stream ID when sending

	Collector.emit ("StreamId1", new Values);

3) Finally declare the data flow ID of the bolt when building the topology

       Builder.setbolt ("Split1", New Splitsentence1bolt (), 2). shufflegrouping ("spout", "streamId1");		
       Builder.setbolt ("Split2", New Splitsentence2bolt (), 2). shufflegrouping ("spout", "streamId2");

2. Confluence

When you build the topology, declare that the bolt receives a few bolts to

       Builder.setbolt ("Count", New Wordcountbolt (), 2). fieldsgrouping ("Split1", New fields ("word"))
		. Fieldsgrouping (" Split2 ", New fields (" word "));

Now let's look at the whole example:

First step: Create a spout data source

Import Java.util.Map;
Import Org.apache.storm.spout.SpoutOutputCollector;
Import Org.apache.storm.task.TopologyContext;
Import Org.apache.storm.topology.OutputFieldsDeclarer;
Import Org.apache.storm.topology.base.BaseRichSpout;
Import Org.apache.storm.tuple.Fields;
Import org.apache.storm.tuple.Values;

Import Org.apache.storm.utils.Utils; /** * Data Source * @author zhengcy * */@SuppressWarnings ("Serial") public class Sentencespout extends Baserichspout {PR
	Ivate Spoutoutputcollector collector; Private string[] sentences = {"Apache Storm is a free and open source distributed realtime computation System", "St Orm,makes,it,easy,to,reliably,process,unbounded,streams,of,data "," doing for realtime processing what Hadoop do for BA
	TCH processing "," Can,be,used,with,any,programming,language "," and is a lot of fun to use "};

	private int index = 0; @Override public void Declareoutputfields (Outputfieldsdeclarer declarer) {declarer.declarestream ("StreamId1", New Fie LDS ("Sentence "));
	Declarer.declarestream ("StreamId2", new fields ("sentence"));
		} @SuppressWarnings ("Rawtypes") public void open (Map config, topologycontext context,spoutoutputcollector collector) {
	This.collector = collector;	
	     public void Nexttuple () {if (index >= sentences.length) {return;
		 } if (index%2==0) {collector.emit ("StreamId1", New Values (Sentences[index]));
		 }else{collector.emit ("StreamId2", New Values (Sentences[index]));
		} index++;
	Utils.sleep (1);
 }
}


Step two: Realize the word cutting BOLT1

Import Org.apache.storm.topology.BasicOutputCollector;
Import Org.apache.storm.topology.OutputFieldsDeclarer;
Import Org.apache.storm.topology.base.BaseBasicBolt;
Import Org.apache.storm.tuple.Fields;
Import Org.apache.storm.tuple.Tuple;
Import org.apache.storm.tuple.Values;

/** *
 cut sentences
 * @author zhengcy
 *
 *
/@SuppressWarnings ("Serial") Public
class Splitsentence1bolt extends Basebasicbolt {

	@Override public
	void Declareoutputfields (outputfieldsdeclarer declarer) {
		Declarer.declare (new fields ("word"));
	}

	@Override public
	Void execute (Tuple input, basicoutputcollector collector) {
		String sentence = Input.getstringbyfield ("sentence");
		string[] Words = Sentence.split ("");
		for (String word:words) {
			collector.emit (new Values (Word));}}}


Step three: Realize the word cutting BOLT2

Import Org.apache.storm.topology.BasicOutputCollector;
Import Org.apache.storm.topology.OutputFieldsDeclarer;
Import Org.apache.storm.topology.base.BaseBasicBolt;
Import Org.apache.storm.tuple.Fields;
Import Org.apache.storm.tuple.Tuple;
Import org.apache.storm.tuple.Values;

/** *
 cut sentences
 * @author zhengcy
 *
 *
/@SuppressWarnings ("Serial") Public
class Splitsentence2bolt extends Basebasicbolt {

	@Override public
	void Declareoutputfields (outputfieldsdeclarer declarer) {
		Declarer.declare (new fields ("word"));
	}

	@Override public
	Void execute (Tuple input, basicoutputcollector collector) {
		String sentence = Input.getstringbyfield ("sentence");
		string[] Words = Sentence.split (",");
		for (String word:words) {
			collector.emit (new Values (Word));}}}

Fourth step: statistical bolts for Words

Import Java.util.HashMap;
Import Java.util.Map;
Import Org.apache.storm.task.TopologyContext;
Import Org.apache.storm.topology.BasicOutputCollector;
Import Org.apache.storm.topology.OutputFieldsDeclarer;
Import Org.apache.storm.topology.base.BaseBasicBolt;

Import Org.apache.storm.tuple.Tuple; /** * Statistical Word * @author zhengcy * */@SuppressWarnings ("Serial") public class Wordcountbolt extends Basebasicbolt {PR
	

    Ivate map<string, long> counts = null;  @SuppressWarnings ("Rawtypes") @Override public void Prepare (Map stormconf, Topologycontext context) {this.counts =
    New hashmap<string, long> (); } @Override public void Cleanup () {for (String Key:counts.keySet ()) {System.out.println (key + ":" +
	   This.counts.get (key)); }} @Override public void execute (Tuple input, basicoutputcollector collector) {String word = Input.getstringbyfiel
		D ("word");
		Long count = This.counts.get (word);
		if (count = = null) {count = 0L; } Count++;
	This.counts.put (Word, count); } @Override public void Declareoutputfields (Outputfieldsdeclarer declarer) {}}


Fifth step: Create a Topology topology

Import Org.apache.storm.Config;
Import Org.apache.storm.LocalCluster;
Import Org.apache.storm.StormSubmitter;
Import Org.apache.storm.topology.TopologyBuilder;

Import Org.apache.storm.tuple.Fields; /** * Word Statistics topology * @author zhengcy * */public class Wordcounttopology {public static void main (string[] args) throws E
		xception {Topologybuilder builder = new Topologybuilder ();
		Builder.setspout ("spout", New Sentencespout (), 1);		
		Builder.setbolt ("Split1", New Splitsentence1bolt (), 2). shufflegrouping ("spout", "streamId1");

		Builder.setbolt ("Split2", New Splitsentence2bolt (), 2). shufflegrouping ("spout", "streamId2"); Builder.setbolt ("Count", New Wordcountbolt (), 2). fieldsgrouping ("Split1", New fields ("word")). Fieldsgrouping ("

		Split2 ", New fields (" word "));
		Config conf = new config ();
		
		Conf.setdebug (FALSE);
			if (args! = null && args.length > 0) {//cluster mode conf.setnumworkers (2);
		Stormsubmitter.submittopology (Args[0], conf, builder.createtopology ()); } else {//local mode localcluster cluster = new Localcluster ();
		    Cluster.submittopology ("Word-count", conf, Builder.createtopology ());  
	        Thread.Sleep (10000); 
		Cluster.shutdown (); }
	}
}



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.