Storm details 2. Write the first storm Application

Source: Internet
Author: User
Tags emit normalizer
Before fully introducing storm, let's use a simple demo to let everyone feel what storm is. Storm running mode:
  1. Local Mode: This mode is equivalent to a task and will be explained in detail later. It runs on a single JVM on the local machine. This mode is mainly used for development and debugging.
  2. Remote mode: In this mode, we submit our topology to the cluster. In this mode, all the components of storm are thread-safe, because they all run on different JVM or physical machines, this mode is the formal production mode.
Write a helloword storm Now we have created such an application to count the number of words in a text file. If you have learned hadoop in detail, you should have written it. Therefore, we need to create such a topology, use a spout to read text files, use the first bolt to parse words, and use the second bolt to count the words parsed, overall structure: can download from here source code: https://github.com/storm-book/examples-ch02-getting_started/zipball/master It is very easy to write a runable demo. We only need three steps:
  1. Create a spout to read data
  2. Create bolts to process data
  3. Create a topology and submit it to the cluster.
Next, let's write the following code and copy it to eclipse (the dependent jar package can be downloaded from the official website) to run it. 1. Create a spout as the data source spout, which implements the iricheat out interface. The function is to read a text file and send each row of content to bolt.
Package storm. demo. spout; import Java. io. bufferedreader; import Java. io. filenotfoundexception; import Java. io. filereader; import Java. util. map; import backtype. storm. spout. spoutoutputcollector; import backtype. storm. task. topologycontext; import backtype. storm. topology. irichspout; import backtype. storm. topology. outputfieldsdeclarer; import backtype. storm. tuple. fields; import backtype. storm. tuple. values; Public Class wordreader implements irichspout {Private Static final long serialversionuid = 1l; private spoutoutputcollector collector; private filereader; private Boolean completed = false; Public Boolean isdistributed () {return false ;} /*** this is the first method, which receives three parameters. The first is the configuration when the topology is created, and the second is all the topology data, the third is to transmit spout data to bolt ***/@ overridepublic void open (MAP Conf, topologycontext context, spoutoutpu Tcollector collector) {try {// obtain the path of the file to be read when creating the topology. This. filereader = new filereader (Conf. get ("wordsfile "). tostring ();} catch (filenotfoundexception e) {Throw new runtimeexception ("error reading file [" + Conf. get ("wordfile") + "]");} // initialize the transmitter this. collector = collector;}/*** this is the most important spout method. Here we read the text file and send each row of it (to bolt) * This method will be called continuously. To reduce its CPU consumption, let the task sleep when the task is completed. ***/@ overridepublic void nexttuple () {If (completed) {try {thread. sleep (1000);} catch (interruptedexception e) {// do nothing} return;} string STR; // open the readerbufferedreader reader = new bufferedreader (filereader ); try {// read all lineswhile (STR = reader. readline ())! = NULL) {/*** transmits each row. values is an arraylist implementation */This. collector. emit (new values (STR), STR) ;}} catch (exception e) {Throw new runtimeexception ("error reading tuple", e );} finally {completed = true; }}@ overridepublic void declareoutputfields (outputfieldsdeclarer declarer) {declarer. declare (new fields ("line") ;}@ overridepublic void close () {// todo auto-generated method stub} @ overridepublic void activate () {// todo auto-generated method stub} @ overridepublic void deactivate () {// todo auto-generated method stub} @ overridepublic void ack (Object msgid) {system. out. println ("OK:" + msgid) ;}@ overridepublic void fail (Object msgid) {system. out. println ("fail:" + msgid) ;}@ overridepublic Map <string, Object> getcomponentconfiguration () {// todo auto-generated method stubreturn NULL ;}}
2. create two bolts to process the spout data. The spout has successfully read the file and transmits each row as a tuple (transmitted in the form of tuple in storm data, here we need to create two bolts to parse each row and count words respectively. The most important thing about bolt is the execute method, which is called whenever a tuple is passed. The first bolt: wordnormalizer
Package storm. demo. bolt; import Java. util. arraylist; import Java. util. list; import Java. util. map; import backtype. storm. task. outputcollector; import backtype. storm. task. topologycontext; import backtype. storm. topology. irichbolt; import backtype. storm. topology. outputfieldsdeclarer; import backtype. storm. tuple. fields; import backtype. storm. tuple. tuple; import backtype. storm. tuple. values; public class wordnormali ZER implements irichbolt {private outputcollector collector; @ overridepublic void prepare (MAP stormconf, topologycontext context, outputcollector collector) {This. collector = collector;}/** this is the most important method in bolts. When a tuple is received, this method is called. * The function of this method is to split each line in the text file into words and send them out (to the next bolt for processing) * **/@ overridepublic void execute (tuple input) {string sentence = input. getstring (0); string [] words = sentence. split (""); fo R (string word: words) {word = word. Trim (); If (! Word. isempty () {word = word. tolowercase (); // emit the wordlist A = new arraylist ();. add (input); collector. emit (A, new values (Word) ;}// confirm that a tuplecollector is successfully processed. ack (input) ;}@ overridepublic void declareoutputfields (outputfieldsdeclarer declarer) {declarer. declare (new fields ("word") ;}@ overridepublic void cleanup () {// todo auto-generated method stub} @ overridepublic Map <string, Object> getcomponentconfiguration () {// todo auto-generated method stubreturn NULL ;}}
The second bolt: wordcounter
Package storm. demo. bolt; import Java. util. hashmap; import Java. util. map; import backtype. storm. task. outputcollector; import backtype. storm. task. topologycontext; import backtype. storm. topology. irichbolt; import backtype. storm. topology. outputfieldsdeclarer; import backtype. storm. tuple. tuple; public class wordcounter implements irichbolt {integer id; string name; Map <string, integer> counters; private outputcoll Ector collector; @ overridepublic void prepare (MAP stormconf, topologycontext context, outputcollector collector) {This. counters = new hashmap <string, integer> (); this. collector = collector; this. name = context. getthiscomponentid (); this. id = context. getthistaskid () ;}@ overridepublic void execute (tuple input) {string STR = input. getstring (0); If (! Counters. containskey (STR) {counters. put (STR, 1);} else {integer c = counters. get (STR) + 1; counters. put (STR, c);} // confirm that a tuplecollector is successfully processed. ack (input);}/*** after the topology is executed, operations such as closing connections and releasing resources will be written here * because this is only a demo, we use it to print our counter **/@ overridepublic void cleanup () {system. out. println ("-- word counter [" + name + "-" + ID + "] --"); For (map. entry <string, integer> entry: counters. entryset () {system. out. println (entry. getkey () + ":" + entry. getvalue ();} counters. clear () ;}@ overridepublic void declareoutputfields (outputfieldsdeclarer declarer) {// todo auto-generated method stub} @ overridepublic Map <string, Object> getcomponentconfiguration () {// todo auto-generated method stubreturn NULL ;}}
3. Create a topology in the main function. Here we will create a topology and a localcluster object, and configure a Config object.
Package storm. demo; import storm. demo. bolt. wordcounter; import storm. demo. bolt. wordnormalizer; import storm. demo. spout. wordreader; import backtype. storm. config; import backtype. storm. localcluster; import backtype. storm. topology. topologybuilder; import backtype. storm. tuple. fields; public class wordcounttopologymain {public static void main (string [] ARGs) throws interruptedexception {// define a topologytopologybuilder builder = new topologybuilder (); builder. setspout ("word-reader", new wordreader (); builder. setbolt ("word-normalizer", new wordnormalizer ()). shufflegrouping ("word-reader"); builder. setbolt ("word-counter", new wordcounter (), 2 ). fieldsgrouping ("word-normalizer", new fields ("word"); // configure config conf = new config (); Conf. put ("wordsfile", "d:/text.txt"); Conf. setdebug (false); // submit topologyconf. put (config. topology_max_spout_pending, 1); // create a local mode clusterlocalcluster cluster = new localcluster (); cluster. submittopology ("getting-started-toplogie", Conf, builder. createtopology (); thread. sleep (1000); cluster. shutdown ();}}
Run this function to view the number of words printed in the background. (PS: because it is in local mode, many error logs may be printed at the start of the operation. Skip this)


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.