Storm Tutorial (iii): developing Storm__storm with Java

Source: Internet
Author: User
Tags ack emit lowercase normalizer prepare
1. Operation Mode

Before you begin, it is necessary to understand the storm mode of operation. There are two different ways.
Local mode
In local mode, the storm topology runs on a single JVM process on the local computer. This pattern is used for development, testing, and debugging because it is the easiest way to see how all the components work together. In this mode, we can adjust the parameters to see how our topology works in different storm configuration environments. To run in local mode, we want to download storm development dependencies to develop and test our topology. Once we have created the first storm project, we will soon understand how to use local mode.
Note: In local mode, it is similar to running in a clustered environment. However, it is important to make sure that all components are thread-safe, because when they are deployed in remote mode they may run on different JVM processes or even on different physical machines, where there is no direct communication or shared memory between them.
We want to run all the examples in this chapter in local mode.

Remote Mode
In remote mode, we submit a topology to the storm cluster, which typically consists of a number of processes running on different machines. Debug information is not present in remote mode, so it is also called production mode. However, it is a good idea to build a storm cluster on a single development machine that can be used to verify that the topology does not have any problems in the cluster environment before it is deployed to the production environment. 2. Create a maven pom.xml file

To run our topology, we can write a pom.xml file that contains the basic components.

    <project xmlns= "http://maven.apache.org/POM/4.0.0" xmlns:xsi= "Http://www.w3.org/2001/XMLSchema-instanc E "xsi:schemalocation=" http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0 . xsd "> <modelVersion>4.0.0</modelVersion> &LT;GROUPID&GT;STORM.BOOK&LT;/GROUPID&G
             T
             <artifactId>Getting-Started</artifactId> <version>0.0.1-SNAPSHOT</version> <build> <plugins> <plugin> <groupid >org.apache.maven.plugins</groupId> <artifactid>maven-compiler-plugin</artifact
                             Id> <version>2.3.2</version> <configuration>
            <source>1.6</source> <target>1.6</target>                 <compilerVersion>1.6</compilerVersion> </configuration> </plugin> </plugins> </build> <repositories > <!--Repository where we can found the Storm dependencies--> <repository&
                     Gt <id>clojars.org</id> <url>http://clojars.org/repo</url> < /repository> </repositories> <dependencies> <!--Storm Depend
                     Ency--> <dependency> <groupId>storm</groupId>
                 <artifactId>storm</artifactId> <version>0.6.0</version> </dependency> </dependencies> </project>

The initial lines specify the project name and version number. We then added a compiler plugin to tell Maven that our code would be compiled with Java1.6. Next we define the MAVEN repository (MAVEN supports specifying multiple warehouses for the same project). Clojars is the warehouse where storm depend. Maven automatically downloads all of the necessary child package dependencies for running local mode.
A typical MAVEN Java project will have the following structure:
Our application directory/
├──pom.xml
└──src
└──main
└──java
| ├──spouts
| └──bolts
└──resources
The subdirectories in the Java directory contain our code, and we keep the files to count the number of words in the resource directory.
Note: Command mkdir-p will create all the required parent directories. 3, spout

The Pout Wordreader class implements the Irichspout interface. We'll see more details in chapter fourth. Wordreader is responsible for reading text from a file line and providing the text line to the first bolt.
Note: A spout publishes a list of domain definitions. This architecture allows you to read data from the same spout stream using different bolts, whose output can also be used as a domain for other bolts, and so on.
Example 2-1 contains the complete code for the Wordread class (we will analyze each part of the code below).
/**
* Case 2-1.src/main/java/spouts/wordreader.java
*/
Package spouts;

    Import Java.io.BufferedReader;
    Import java.io.FileNotFoundException;
    Import Java.io.FileReader;
    Import Java.util.Map;
    Import Backtype.storm.spout.SpoutOutputCollector;
    Import Backtype.storm.task.TopologyContext;
    Import Backtype.storm.topology.IRichSpout;
    Import Backtype.storm.topology.OutputFieldsDeclarer;
    Import Backtype.storm.tuple.Fields;

    Import backtype.storm.tuple.Values;
        public class Wordreader implements Irichspout {private Spoutoutputcollector collector;
        Private FileReader FileReader;
        Private Boolean completed = false;
        Private Topologycontext context;
        public Boolean isdistributed () {return false;}
        public void Ack (Object msgId) {System.out.println ("OK:" +msgid);
        public void Close () {} is public void fail (Object msgId) {System.out.println ("fail:" +msgid); /** * The only one by one things this method does is the text line in the distribution file * * * * public void Nexttuple () {/** * This method will continue to be invoked until the entire file is read and we will wait and return.
                 */if (completed) {try {thread.sleep (1000);
             catch (Interruptedexception e) {//Do nothing} return;
             } String str;
             Create reader BufferedReader reader = new BufferedReader (FileReader);
                  try{//Read all text lines while ((str = reader.readline ())!= null) {/**
                 * Publish a new value by line * * THIS.COLLECTOR.EMIT (new values (str), str);
             }}catch (Exception e) {throw new RuntimeException ("Error reading tuple", e);
             }finally{completed = true; }/** * We will create a file and maintain a collector object/public void open (Map conf, TOPOLOGYC Ontext ConText, Spoutoutputcollector collector) {try {this.context = context;
                 This.filereader = new FileReader (Conf.get ("Wordsfile"). toString ()); catch (FileNotFoundException e) {throw new RuntimeException ("Error reading file [" +conf.get ("WORDF
                 Ile ") +"] ");
         } this.collector = collector; /** * Declares input field "word"/public void Declareoutputfields (Outputfieldsdeclarer declarer
         ) {Declarer.declare (new Fields ("line"));
 }
    }

The first invoked spout method is public void open (Map conf, topologycontext context, Spoutoutputcollector collector). It receives the following parameters: The Configuration object, which is the creation of the Topology object, the Topologycontext object, contains all the topological data, and the Spoutoutputcollector object, which allows us to publish the data to be processed by the bolts. The following code master is the implementation of this method.
public void Open (Map conf, topologycontext context,
Spoutoutputcollector collector) {
try {
This.conte XT = Context;
This.filereader = new FileReader (Conf.get ("Wordsfile"). toString ());
} catch (FileNotFoundException e) {
throw new RuntimeException ("Error reading file [" +conf.get ("wordfile") + "]");
}
This.collector = collector;
}
We created a FileReader object in this method to read the file. Next we want to implement the public void Nexttuple (), through which we publish the pending data to bolts. In this case, the method reads the file and publishes the data line by row.

    public void Nexttuple () {
        if (completed) {
            try {
                thread.sleep (1);
            } catch (Interruptedexception e) {
                //Nothing to do
            }
            return;
        }
        String str;
        BufferedReader reader = new BufferedReader (filereader);
        try{while
            ((str = reader.readline ())!= null) {
                this.collector.emit (new Values (str));
            }
        catch (Exception e) {
            throw new RuntimeException ("Error reading tuple", e);
        } finally{
            completed = true;
        }
    }

The note:values is a arrarlist implementation, and its elements are the parameters passed into the constructor.
Nexttuple () is called periodically within the same loop as ACK () and fail (). It must release control of the thread when there is no task, and other methods have the opportunity to execute. Therefore, the first line of Nexttuple is to check whether the finish has been processed. If it is done, in order to reduce the processor load, hibernate a millisecond before returning. If the task is complete, each row in the file is read out and distributed.
Note: A tuple (tuple) is a list of named values that can be any Java object (as long as it is serializable). By default, storm serializes the types of strings, byte arrays, ArrayList, HashMap, and HashSet. 4, Bolts

Now we have a spout that reads files by row and publishes one tuple per row, and creates two bolts to process them (see figure 2-1). Bolts implements the interface Backtype.storm.topology.IRichBolt.
The most important method for Bolt is void execute (Tuple input), which is called once each time a tuple is received, and several tuples are published.
Note: If necessary, bolt or spout will publish several tuples. When you call the Nexttuple or Execute methods, they may publish 0, one, or more tuples. You will learn more about this in the fifth chapter.
The first bolt,wordnormalizer is responsible for getting and standardizing each line of text. It divides the lines of text into words, uppercase to lowercase, and the kinsoku blank.
First we want to declare the bolt of the argument:
public void Declareoutputfields (Outputfieldsdeclarer declarer) {
Declarer.declare (New Fields ("word"));
}
Here we declare that Bolt will publish a field named "word".
Next we implement public void execute (Tuple input) to process the incoming tuples:
public void execute (Tuple input) {
String sentence=input.getstring (0);
String[] Words=sentence.split ("");
for (String word:words) {
Word=word.trim ();
if (!word.isempty ()) {
Word=word.tolowercase ();
Publish this word
Collector.emit (New Values (word));
}
}
Responding to tuples
Collector.ack (input);
}
The first row reads the value from the tuple. Values can be read by location or name. The values are then processed and published with the Collector object. Finally, the ACK () method of the Collector object is invoked each time to confirm that a tuple has been successfully processed.
Example 2-2 is the complete code for this class.
Case 2-2 Src/main/java/bolts/wordnormalizer.java
Package bolts;
Import java.util.ArrayList;
Import java.util.List;
Import Java.util.Map;
Import Backtype.storm.task.OutputCollector;
Import Backtype.storm.task.TopologyContext;
Import Backtype.storm.topology.IRichBolt;
Import Backtype.storm.topology.OutputFieldsDeclarer;
Import Backtype.storm.tuple.Fields;
Import Backtype.storm.tuple.Tuple;
Import backtype.storm.tuple.Values;
public class Wordnormalizer implements irichbolt{
Private Outputcollector collector;
public void Cleanup () {}
/**
* Bolt receives the line of text from the Word file and standardizes it.
* The text line is all converted to lowercase, and it is sliced to get all the words from it.
*/
public void execute (Tuple input) {
String sentence = input.getstring (0);
string[] Words = Sentence.split ("");
for (String word:words) {
Word = Word.trim ();
if (!word.isempty ()) {
Word=word.tolowercase ();
Publish this word
List a = new ArrayList ();
A.add (input);
Collector.emit (a,new Values (word));
}
}
Responding to tuples
Collector.ack (input);
}
public void Prepare (Map stormconf, Topologycontext context, Outputcollector collector) {
This.collector=collector;
}

    /**
      * This *bolt* will only publish the Word field *
    /public void Declareoutputfields (Outputfieldsdeclarer declarer) {
        Declarer.declare (New Fields ("word"));
    }

Note: With this example, we learned to publish multiple tuples in an execute call. If this method receives the sentence "This is the Storm book" in one call, it will publish five tuples.
The next bolt,wordcounter is responsible for counting the words. At the end of this topology (the cleanup () method is invoked), we will display the number of each word.
Note: The bolt of this example is not published, it keeps the data in the map, but it can be saved to the database in a real-world scenario.

Package bolts;
Import Java.util.HashMap;
Import Java.util.Map;
Import Backtype.storm.task.OutputCollector;
Import Backtype.storm.task.TopologyContext;
Import Backtype.storm.topology.IRichBolt;
Import Backtype.storm.topology.OutputFieldsDeclarer;

Import Backtype.storm.tuple.Tuple;
    public class WordCounter implements irichbolt{Integer ID;
    String name;
    map<string,integer> counters;

    Private Outputcollector collector;  /** * At the end of this spout (cluster shutdown), we will display the number of words * * @Override public void Cleanup () {System.out.println ("-
        Number of words "" +name+ "-" +id+ ""--"); For (map.entry<string,integer> Entry:counters.entrySet ()) {System.out.println (Entry.getkey () + ":" +entry
        . GetValue ()); }/** * For each word count */@Override public void execute (Tuple input) {String Str=input.get
        String (0);
        /** * If the word does not yet exist in map, we create one, and if it is, we add 1/if (!counters.containskey (str)) {    Conters.put (str,1);
            }else{Integer C = Counters.Get (str) + 1;
        Counters.put (STR,C);
    //pairs of tuples as response collector.ack (input); /** * Initialization/@Override public void Prepare (Map stormconf, Topologycontext context, Outputcollecto
        R collector) {this.counters = new hashmap<string, integer> ();
        This.collector = collector;
        THIS.name = Context.getthiscomponentid ();
    This.id = Context.getthistaskid (); @Override public void Declareoutputfields (Outputfieldsdeclarer declarer) {}}

The Execute method uses a map to collect words and count them. At the end of the topology, the Clearup () method is invoked to print the counter map. (Although this is just an example, you should normally use the cleanup () method to close active connections and other resources when the topology is closed.) ) 5, main class

You can create topologies and a local cluster object in the main class to facilitate testing and debugging locally. Localcluster can be configured with a Config object that lets you try different cluster configurations. For example, when you use a different number of work processes to test your topology, you can find errors if you accidentally use a global variable or class variable. (see chap. III for more information)
Note: The processes of all topology nodes must be able to run independently, without relying on shared data (that is, no global variables or class variables), because these processes may run on different machines when the topology is running in a real-world clustered environment.
Next, Topologybuilder will be used to create a topology that determines how each node is storm and how they exchange data.
Topologybuilder builder = new Topologybuilder ();
Builder.setspout ("Word-reader", New Wordreader ());
Builder.setbolt ("Word-normalizer", New Wordnormalizer ()). Shufflegrouping ("Word-reader");
Builder.setbolt ("Word-counter", New WordCounter ()). Shufflegrouping ("Word-normalizer");
Connect through the Shufflegrouping method between spout and bolts. This grouping method determines that the storm sends messages from the source node to the target node in a random allocation manner.
Next, create a config object that contains a topology configuration that merges with the cluster configuration at run time and is sent to all nodes through the Prepare method.
Config conf = new config ();
Conf.put ("Wordsfile", Args[0]);
Conf.setdebug (TRUE);
The file name of the file read by spout, assigned to the Wordfile property. Because it is in the development phase, setting the Debug property to True,strom prints all messages exchanged between nodes, as well as other debugging data that helps to understand how the topology runs.
As you've said before, you're going to run this topology with a Localcluster object. In a production environment, the topology continues to run, but for this example, you can just run it for a few seconds to see the results.
Localcluster cluster = new Localcluster ();
Cluster.submittopology ("Getting-started-topologie", conf, Builder.createtopology ());
Thread.Sleep (2000);
Cluster.shutdown ();
Call Createtopology and Submittopology, run the topology, hibernate for two seconds (the topology runs on another thread), and then close the cluster.
Example 2-3 is the complete code
Case 2-3 Src/main/java/topologymain.java
Import spouts. Wordreader;
Import Backtype.storm.Config;
Import Backtype.storm.LocalCluster;
Import Backtype.storm.topology.TopologyBuilder;
Import Backtype.storm.tuple.Fields;
Import bolts. WordCounter;
Import bolts. Wordnormalizer;

public class Topologymain {public
    static void Main (string[] args) throws Interruptedexception {
    //define Topology
        Topologybuilder builder = new Topologybuilder ());
        Builder.setspout ("Word-reader", New Wordreader ());
        Builder.setbolt ("Word-normalizer", New Wordnormalizer ()). Shufflegrouping ("Word-reader");
        Builder.setbolt ("Word-counter", New WordCounter (), 2). fieldsgrouping ("Word-normalizer", New Fields ("word"));

    Configure
        config conf = new config ();
        Conf.put ("Wordsfile", Args[0]);
        Conf.setdebug (false);

    Run topology
         conf.put (config.topology_max_spout_pending, 1);
        Localcluster cluster = new Localcluster ();
        Cluster.submittopology ("Getting-started-topologie", conf, Builder.createtopology ();
        Thread.Sleep (1000);
        Cluster.shutdown ();
    }

6. Observe the operation situation
You are ready to run your first topology. Create a file under this directory,/src/main/resources/words.txt, a word line, and then run the topology with the following command: MVN exec:java-dexec.mainclass= "Topologymain"- Dexec.args= "Src/main/resources/words.txt.
For example, if your Words.txt file has the following contents: Storm test are great is a Storm simple application but very powerful really Storm are great you should You will see something like this in the log: Is:2 application:1 but:1 great:1 test:1 simple:1 storm:3 really:1 are:1 great:1 an:1-Powerful: 1 Very:1 In this example, there is only one instance of each type of node. But if you have a very large log file. You can easily change the number of nodes in the system to achieve parallel work. This time, you will create two WordCounter instances.
Builder.setbolt ("Word-counter", New WordCounter (), 2). shufflegrouping ("Word-normalizer");
When the program returns, you will see:-The number of words "word-counter-2"-application:1 is:1 great:1 are:1 powerful:1 storm:3-Words [word-counter-3]-r Eally:1 is:1 but:1 great:1 test:1 simple:1 The An:1 very:1 is wonderful. It is too easy to modify the degree of parallelism (in fact, each instance will run on a separate machine). But there seems to be a problem: the word is and great are counted once for each wordcounter. How could this be. When you call shufflegrouping, you decide that storm will send a message to your bolt instance in a randomly assigned way. In this example, the ideal approach is to send the same word problem to the same WordCounter instance. You change shufflegrouping ("Word-normalizer") to Fieldsgrouping ("Word-normalizer", New Fields ("word") to achieve the goal. Try it again and run the program to confirm the result. You will learn more about grouping and message flow types in subsequent chapters.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.