Storm Introduction Example-word counter
Concept
The Storm distributed computing structure is called Topology (topology), which consists of stream (data stream), spout (the generator of the data flow), Bolt (operation).
The core data structure of Storm is tuple. A tuple is a list of one or more key-value pairs, the Stream is a sequence of unrestricted tuple elements.
spout represents the main data entry for a Storm topology, acts as a collector, connects to a data source, transforms the data into a tuple, and launches a tuple as a data stream.
Bolt can be understood as an operation or function in a computational program, with one or more data streams as input, and optionally outputting one or more data streams after the data is implemented. Bolts can subscribe to multiple streams of data that are emitted by spout or other bolts, thus creating a complex data flow transformation network.
The data flow for this example word count topology is probably this:
Project Building
New class Sentencespout.java (Data Flow generator)
Import Java.util.Map;
Import Org.apache.storm.spout.SpoutOutputCollector;
Import Org.apache.storm.task.TopologyContext;
Import Org.apache.storm.topology.OutputFieldsDeclarer;
Import Org.apache.storm.topology.base.BaseRichSpout;
Import Org.apache.storm.tuple.Fields;
Import org.apache.storm.tuple.Values;
Import Org.apache.storm.utils.Utils; /** * Send a tuple data stream back end * @author Soul * */public class Sentencespout extends Baserichspout {//baserichspout is ispout
interface and the simple implementation of the IComponent interface, the interface to the method is not used to provide the default implementation of private Spoutoutputcollector collector;
Private string[] sentences = {' My name is soul ', ' im a boy ', ' I have a dog ',
"My dog has Fleas" and "My Girl Friend is Beautiful"};
private int index=0;
The/** * Open () method is defined in the Ispout interface and is called when the spout component is initialized.
* Open () accepts three parameters: a map with Storm configuration, a Topologycontext object that provides information about the components in the topology, and Spoutoutputcollector object provides a way to emit a tuple. * In this example, we do not need to perform initialization, simply stored in a spoutoutputcollector instance variable.
*/public void Open (Map conf, topologycontext context, Spoutoutputcollector Collector) {//TODO Auto-ge
nerated Method Stub this.collector = collector;
The/** * Nexttuple () method is the core of any spout implementation.
* Storm calls this method to emit a tuple to the output collector.
* Here we just issue the current index of the sentence and add the index ready to launch the next sentence.
*/public void Nexttuple () {//collector.emit (new Values (' Hello World ' a test ');
TODO auto-generated Method Stub this.collector.emit (new Values (Sentences[index));
index++;
if (index>=sentences.length) {index=0;
} utils.sleep (1); The/** * declareoutputfields is defined in the IComponent interface, and all storm components (spout and bolts) must implement this interface * to tell the storm flow component to emit those data streams, the t of each stream Uple will contain fields */public void Declareoutputfields (Outputfieldsdeclarer declarer) {//TODO auto-generated met Hod stub Declarer.declare (new fields ("sentence"));//Tell component to emit a data stream containing sentence field}}
New class Splitsentencebolt.java (Word splitter)
Import Java.util.Map;
Import Org.apache.storm.task.OutputCollector;
Import Org.apache.storm.task.TopologyContext;
Import Org.apache.storm.topology.OutputFieldsDeclarer;
Import Org.apache.storm.topology.base.BaseRichBolt;
Import Org.apache.storm.tuple.Fields;
Import Org.apache.storm.tuple.Tuple;
Import org.apache.storm.tuple.Values; /** * Subscribe sentence spout to transmit a tuple stream, realize split word * @author Soul * * * */public class Splitsentencebolt extends Baserichbolt {/
/baserichbolt is the implementation of IComponent and Ibolt interface//Inherit this class, it is not necessary to implement this example does not care about the method of private outputcollector collector;
The/** * Prepare () method is similar to the open () method of Ispout.
* This method is called during blot initialization and can be used to prepare the resources used by the bolt, such as a database connection.
* In this example, like the Sentencespout class, the Splitsentencebolt class does not require much extra initialization, so the prepare () method only holds references to Outputcollector objects. */public void Prepare (Map stormconf, Topologycontext context, Outputcollector Collector) {//TODO Auto-gener
Ated method Stub this.collector=collector; }/** * Splitsentencebolt core function is to define execute in class Ibolt() method, this method is defined in the Ibolt interface.
* This method is called every time the bolt receives a subscribed tuple from the stream.
* In this example, the received tuple finds the value of "sentence", and splits the value into a single word, and then emits a new tuple by word. */public void execute (Tuple input) {//TODO auto-generated method stub String sentence = Input.getst
Ringbyfield ("sentence");
string[] Words = Sentence.split (""); for (String word:words) {this.collector.emit (new Values (word));//down a bolt firing data}}/
* * The Plitsentencebolt class defines a tuple stream, each containing a field ("word").
*/public void Declareoutputfields (Outputfieldsdeclarer declarer) {//TODO auto-generated method stub
Declarer.declare (New fields ("word"));
}
}
New class Wordcountbolt.java (Word counter)
Import Java.util.HashMap;
Import Java.util.Map;
Import Org.apache.storm.task.OutputCollector;
Import Org.apache.storm.task.TopologyContext;
Import Org.apache.storm.topology.OutputFieldsDeclarer;
Import Org.apache.storm.topology.base.BaseRichBolt;
Import Org.apache.storm.tuple.Fields;
Import Org.apache.storm.tuple.Tuple;
Import org.apache.storm.tuple.Values; /** * Subscribe to split sentence bolt output stream, implement word count, and send current count to next bolt * @author Soul * */public class Wordcountbolt extends Baseric
Hbolt {private Outputcollector collector; Store words and corresponding counts private hashmap<string, long> counts = null;//Note: Non-serializable objects need to be instantiated in prepare/** * Most instance variables are usually in the pre
Pare (), the design pattern is determined by the way the topology is deployed * because component spout and bolt are serialized instance variables sent on the network when the topology is deployed.
* If spout or bolt has any non-serializable instance variables instantiated before serialization (for example, created in constructors) * will throw notserializableexception and the topology will not be published.
* In this example, because HashMap is serializable, it can be safely instantiated in a constructor.
* However, it is usually preferable to copy and instantiate the base data type and the serializable object in the constructor * while instantiating the non-serializable object in the prepare () method. */Public Void Prepare (Map stormconf, Topologycontext context, Outputcollector Collector) {//TODO auto-generated method
Stub this.collector = collector;
this.counts = new hashmap<string, long> ();
}/** * in the Execute () method, we find a count of the received words (if not present, initialized to 0) * Then increase the count and store, emit a new word and the current count consists of two tuples.
* Transmit count as stream allows other bolts of the topology to subscribe and perform additional processing. */public void execute (Tuple input) {//TODO auto-generated method stub String Word = Input.getstrin
Gbyfield ("word");
Long count = This.counts.get (word); if (count = = null) {count = 0l;//if not present, initialize to 0} count++;//increment count this.counts.put (Word, C
Ount);//Storage Count This.collector.emit (new Values (Word,count)); }/** * */public void Declareoutputfields (Outputfieldsdeclarer declarer) {//TODO Auto-gener Ated method Stub//declares an output stream where a tuple consists of a word and a corresponding count, and a backward launch//other bolt can subscribe to this data stream to further process Declarer.declare (new Fie LDS ("word", "Count")); }
}
New class Reportbolt.java (Report Builder)
Import java.util.ArrayList;
Import java.util.Collections;
Import Java.util.HashMap;
Import Java.util.Map;
Import Org.apache.storm.task.OutputCollector;
Import Org.apache.storm.task.TopologyContext;
Import Org.apache.storm.topology.OutputFieldsDeclarer;
Import Org.apache.storm.topology.base.BaseRichBolt;
Import Org.apache.storm.tuple.Tuple; /** * Generate a report * @author Soul * */public class Reportbolt extends Baserichbolt {private hashmap<string, LONG&G T
counts = null;//Save Word and corresponding count public void prepare (Map stormconf, Topologycontext context, Outputcollector collector) {
TODO auto-generated Method Stub this.counts = new hashmap<string, long> (); } public void execute (Tuple input) {//TODO auto-generated method stub String Word = Input.getstrin
Gbyfield ("word");
Long count = Input.getlongbyfield ("Count");
This.counts.put (Word, count);
Real-time output System.out.println ("Result:" +this.counts);
}public void Declareoutputfields (Outputfieldsdeclarer declarer) {//TODO auto-generated Method Stub//This is the end End Bolt, no need to transmit the data stream, there is no definition}/** * cleanup is defined in the Ibolt interface * Storm will call this method before terminating a bolt * in this case we use the cleanup () method in the top Output final count results when ology off * Typically, the cleanup () method is used to release the resources that the bolt occupies, such as open file handles or database connections * But when the storm topology runs on a cluster, the Ibolt.cleanup () method does not guarantee execution (this
The development model, the production environment do not do this).
*/public void Cleanup () {System.out.println ("----------FINAL COUNTS-----------");
arraylist<string> keys = new arraylist<string> ();
Keys.addall (This.counts.keySet ());
Collections.sort (keys);
for (String Key:keys) {System.out.println (key + ":" + this.counts.get (key));
} System.out.println ("----------------------------"); }
}
Modifying program main entry App.java
Import Org.apache.storm.Config;
Import Org.apache.storm.LocalCluster;
Import Org.apache.storm.topology.TopologyBuilder;
Import Org.apache.storm.tuple.Fields;
Import Org.apache.storm.utils.Utils;
/** * Implement Word Count topology * */public class App {private static final String sentence_spout_id = "Sentence-spout";
private static final String split_bolt_id = "Split-bolt";
private static final String count_bolt_id = "Count-bolt";
private static final String report_bolt_id = "Report-bolt";
private static final String Topology_name = "Word-count-topology";
public static void Main (string[] args)//throws Exception {//system.out.println ("Hello world!");
Instantiate spout and bolt sentencespout spout = new Sentencespout ();
Splitsentencebolt Splitbolt = new Splitsentencebolt ();
Wordcountbolt Countbolt = new Wordcountbolt ();
Reportbolt Reportbolt = new Reportbolt (); Topologybuilder builder = new Topologybuilder ();//CreatedA Topologybuilder instance//topologybuilder provides a streaming-style API to define the flow of data between topology components//builder.setspout (sentence_spout_id,
spout);//Register a sentence spout//Set two Executeor (thread), default one Builder.setspout (sentence_spout_id, spout,2); Sentencespout-Splitsentencebolt//Register a bolt and subscribe to sentence emitted data stream, shufflegrouping method tells Storm to Sentencespo The UT-emitted tuple is randomly distributed evenly to the Splitsentencebolt instance//builder.setbolt (split_bolt_id, Splitbolt). Shufflegrouping (Sentence_
SPOUT_ID); The Splitsentencebolt Word splitter sets 4 task,2 executeor (threads) Builder.setbolt (split_bolt_id, splitbolt,2). Setnumtasks (4).
Shufflegrouping (sentence_spout_id); Splitsentencebolt--Wordcountbolt//fieldsgrouping route A tuple containing specific data to a particular bolt instance//here fieldsgrouping () The Tuuple method guarantees that all "word" fields will be routed to the same Wordcountbolt instance//builder.setbolt (count_bolt_id, Countbolt). Fieldsgrouping (SPLIT
_bolt_id, new fields ("word")); Wordcountbolt Word counter set 4 executeor (thread) BUILDER.SEtbolt (count_bolt_id, countbolt,4). fieldsgrouping (split_bolt_id, new fields ("word")); Wordcountbolt---Reportbolt//globalgrouping is to route all the tuple Wordcountbolt launched to the only Reportbolt Builder.setb
Olt (report_bolt_id, Reportbolt). globalgrouping (count_bolt_id); Config config = new config (); The//config class is a subclass of hashmap<string,object> used to configure the behavior of the Topology runtime//Set number of worker/
/config.setnumworkers (2);
Localcluster cluster = new Localcluster ();
Locally submitted cluster.submittopology (Topology_name, config, builder.createtopology ());
Utils.sleep (10000);
Cluster.killtopology (Topology_name);
Cluster.shutdown (); }
}
Run the program to see the word count real-time output effect
Generate a report after 10 seconds of running