Real-Time Computing Framework II: an example of Storm's entry

Source: Internet
Author: User

Prepare, fire, aim ...


1 Summary and Promotion

Since January, it is a floating floats swing, twists ah.

First participated in the company's creative Marathon competition, although the completion of the work within 24 hours, but they feel the effect is very poor, natural results are not high. Through this 24-hour continuous struggle and later the various product descriptions and other links, found a lot of shortcomings in development. First of all, we have a deep understanding and understanding of our products, but also on the product, found more can develop into a successful idea, this is I think the best point; Secondly, it is very important to communicate with the team members in a short time and build the product. Third, choose C + + as a 24-hour language, The efficiency of development is relatively slow, the effect is very poor; The United States is prepared to win a difficult struggle, especially when the enemy is very strong, and the description is too poor, the works show, did not think of the highlights, show too much failure.

And then just finished the creative marathon, and began to take charge of the annual meeting of the screen project, the company employees through the public number, live to the bomb screen to send the page, and then send the bomb screen, displayed in the annual meeting background screens, has achieved great results.

There are a variety of hardware on the competition, such as the Inteledison of the creative contest, etc., have not had time to research and development, but also the new year.

Always want to spare some time to think about summing up, but found that the rhythm of tension, completely tolerate you think more, so complete the battle. But in this busy rhythm, more and more find their own shortcomings, but also found in the trend of Internet development, more and more opportunities and challenges.

Cheng Wan defeated, I am not a cowardly person, more than a low-key person. To seize this opportunity, one day, will let you see our work, on everyone's mobile phone, computer, and a variety of smart devices on top!

Last summed up the real-time computing framework Storm building process, after this period of time, on this has further development. Encountered a lot of difficulties during the following 1.1 points to introduce to everyone, to see a real-time cloud computing framework of the powerful!

The reference to the beginning of this article uses the "Programmer's Way of practicing: from the handyman to the expert" in the preface to the chapter on tracer Bullets. I always like to use this method, can quickly generate a dmeo can be executed, and then follow the demo to continue to expand the revision, etc., until formally perfected, the production of products. Introduce this method to everyone, hope for everyone's work to help!


2 Basic components of storm

Through the previous section, we can build a storm execution environment and open the corresponding admin page through a browser. If you have successfully reached this point, then congratulations, Storm's framework has been successfully built, and the next step is how to apply storm. First of all, to introduce the core module of storm, we basically need to rely on these modules to correspond to the development.

2.1 Topology-topology

We need to submit a live-running application to storm to execute the application by storm. So, this application is called a Topology (topology).

Why is it called a topology? Topology in computer network, is to abstract the computer and communication equipment into a point, the transmission medium is abstracted into a line, the geometry composed of points and lines is the topological structure of the computer network. We submit an application that is executed on a storm cluster, where the application is running in a state such as.


The spout and bolts are described below, and intuitively, the application we execute is a topology.


2.2 Vents-Spout

Spout is the source of the data stream for the entire topology, and in general, spout reads the data from the external data source and then translates it into the topology internal data format, which is then sent to the bolt for calculation processing.

Spout mainly has a nexttuple function, topology will call this function constantly, so the related data acquisition work is written in this function.

2.3 Bolt –bolt

In topology, all the processing is done in the bolt, which is the node that the stream handles. Bolt gets the data from the topology and processes it.

Bolts have an execute function that, after receiving the data, calls this function to process the received data.

2.4 Stream –stream

Stream is an unbounded tuple sequence, one by one sequence, which forms the stream. The processing data for spout and bolts is a stream.

2.5 Stream grouping-stream grouping

Flow groupings define how to distribute between bolt tasks. It is easy to understand that some data should be given to a fixed worker for processing, a simple example in the following example.



3 Storm Instances

Understanding the above sections, may be a bit not very understanding, now join a specific example, to detail how these parts are bonded and applied.


3.1 Requirements Description

In a foreign country, a statistic is needed for the names of local residents, that is, the number of times each name is used. For example, local residents (assuming 10 people) have the following names:

Nathan

Mike

Jackson

Jackson

Mike

Mike

Golda

Bertels

Golda

Bertels

Then, you can count the following result information:

Nathan 1

Mike 3

Jackson 2

Golda 2

Bertels 2

In addition, in order to see the results of the calculations, each name is processed with a "!!!" for each name and printed output. Nathan, for example, printed the result Nathan!!!。

Now assume that there are currently n individuals to count, the name hypothesis or only these five, then how to use storm to calculate statistics and print the results?

3.2 Stream Implementation

Because only the name is now transmitted as data, the current stream uses a string.

3.3 Spout Implementation

According to the requirements described above, the main task of spout is in the name array string[] names = new string[]{"Nathan", "Mike", "Jackson", "Golda", "Bertels"}; and send to bolt for statistical calculation, and add "!!!" After printing it. So the specific implementation is as follows.

Package Storm.spout;import Java.util.map;import Java.util.random;import backtype.storm.spout.SpoutOutputCollector; Import Backtype.storm.task.topologycontext;import Backtype.storm.topology.outputfieldsdeclarer;import Backtype.storm.topology.base.baserichspout;import Backtype.storm.tuple.fields;import Backtype.storm.tuple.Values ; Import Backtype.storm.utils.utils;public class Namesspout extends Baserichspout {spoutoutputcollector m_collector; public void Open (Map conf, topologycontext context, Spoutoutputcollector collector) {m_collector = collector;}  public void Nexttuple () {final string[] names = new string[]{"Nathan", "Mike", "Jackson", "Golda", "Bertels"};final Random Rand = new Random (); final String name = Names[rand.nextint (names.length)]; Utils.sleep (+); M_collector.emit (new Values (name));} public void Declareoutputfields (Outputfieldsdeclarer declarer) {Declarer.declare (New fields ("name"));}}

First, the custom spout needs to inherit the interface of Storm's related spout, such as baserichspout or irichspout.

Second, in the open function, the implementation of the initialization of resources, and so on, there is no special operation, only the stream gets bound to itself collector.

Third, declare the format of the output stream, which is the Declareoutputfields function.

Finally, the implementation of the flow generation operation Nexttuple function, here in the name of a random selection, and through the emit to send, Bolt received the name, and the next step of processing.

At this point, a simple spout is done.

3.4 Bolt Implementation

Bolt operation is divided into two parts, the first part is the statistical calculation, the second part is to carry out "!!!" To add. It also needs to inherit the storm's corresponding class Baserichbolt or other interfaces. The specific implementation is as follows.

Package Storm.bolt;import Java.util.hashmap;import Java.util.map;import backtype.storm.task.outputcollector;import Backtype.storm.task.topologycontext;import Backtype.storm.topology.outputfieldsdeclarer;import Backtype.storm.topology.base.baserichbolt;import Backtype.storm.tuple.fields;import backtype.storm.tuple.Tuple; public class Exclamationbolt extends Baserichbolt {outputcollector m_collector;public map<string, integer> Namecountmap = new hashmap<string, integer> ();p ublic void Prepare (Map stormconf, Topologycontext context, Outputcollector collector) {m_collector = collector;}  public void execute (Tuple input) {///First step, statistical calculation of integer value = 0;if (Namecountmap.containskey (input.getstring (0))) {value = Namecountmap.get (input.getstring (0));} Namecountmap.put (input.getstring (0), ++value);//second step, Output System.out.println (input.getstring (0) + "!!!"); System.out.println (value); M_collector.ack (input);} public void Declareoutputfields (Outputfieldsdeclarer declarer) {declarer.declare (new FielDS ("name"));}} 

The functions for initializing the Prepare function and declaring the output stream are declareoutputfields not to be re-stated, similar to spout's related functions.

A map is defined here to count the number of occurrences of the name, and the name is modified to print to the console message.

The statistical calculation is implemented in the Execute interface, and in more complex cases, it can be split into multiple bolts to perform different computational parts separately.


3.5 Topology Implementation

The two major parts have already been implemented, so how do you get topology executed? Topology execution is divided into two modes, the first being the local mode, the debug mode, and the second being committed to the storm frame and executed remotely.

First, according to the local mode, the remote mode can add an execution parameter to distinguish. The specific implementation is as follows.

Package Storm.topology;import Storm.bolt.exclamationbolt;import Storm.spout.namesspout;import Backtype.storm.config;import Backtype.storm.localcluster;import Backtype.storm.stormsubmitter;import Backtype.storm.topology.topologybuilder;import Backtype.storm.utils.utils;public class ExclamationTopology {public static void Main (string[] args) throws Exception {Topologybuilder builder = new Topologybuilder (); Builder.setspout ("Name ", New Namesspout (), 5); Builder.setbolt (" Exclaim ", New Exclamationbolt (), 5). shufflegrouping (" name "); Config conf = new config (); Conf.setdebug (false); Conf.put (Config.topology_debug, false); if (args! = null && Args.length > 0) {conf.setnumworkers (10); Stormsubmitter.submittopology (Args[0], conf, builder.createtopology ());} else {localcluster cluster = new Localcluster (); Cluster.submittopology ("Test", Conf, Builder.createtopology ()); Utils.sleep (10000); Cluster.killtopology ("test"); Cluster.shutdown ();}}}

The main method is implemented in topology, in which the creation of Topology,topology is to set up the relationship between spout and Bolt, and the method of establishing the relationship is mainly established by name. For example, when you specify the processing bolt for the spout output stream, the name is set to the name "name" of spout by setting the name in the shufflegrouping.

Finally, load the configuration and execute. The local mode and remote mode are distinguished by the parameters, and if they contain parameters, they are remote mode, otherwise local mode.

Once this is done, click on the eclipse's Execute button to execute the topology, and the message printed in the bolt can be seen in the Output window.




3.6 Stream Grouping implementation

Next, it's a very interesting part. Stream grouping The stream, how exactly is it used?

Let's look at the results of the last execution, as follows:


As you can see, Mike two times the result is 3, which is obviously wrong, what is this for?

Back to see our topology part of the implementation, there is this line of code:

Builder.setbolt ("Exclaim", New Exclamationbolt (), 5). shufflegrouping ("name");

You can see a shufflegrouping in the back, this is the so-called stream grouping. The current set is a random grouping, then the number of statistics in map is naturally disordered. We will replace this line of code with the following form:

Builder.setbolt ("Exclaim", New Exclamationbolt (), 5). fieldsgrouping ("Name", New Fields ("name"));
So, the new results are as follows:

We can find that the results are correct and compound our calculation requirements.

Storm inside the stream of 7 kinds of methods, specific information can go to the official website to view the document, in addition, you can also define the way you want to group.


4 Packaging and execution

How to create a topology and Topology execution section has been described, and the next step is to submit the topology to the storm framework for execution. Now we need to use the MAVEN tools we downloaded and installed before we can package them.

4.1 Maven Package

4.1.1 Installing the Maven plugin for Eclipse

In the Eclipse's menu bar, select "Install New Software" in "help", such as:


Enter the update site for maven in "Workwith":

Http://download.eclipse.org/technology/m2e/release

Then, after selecting the components to install, click "Next" until the installation is complete, and then restart Eclipse.

4.1.2 is packaged in Eclipse, right-click on project to be packaged and select "Maven build" in "Run as" in the right-click menu, as shown in.


Add parameter: Clean package, when you see success in the Output window, is packaged successfully, otherwise modified and repackaged according to the error message.


4.2 Submit jar Package to storm

Open the console, go to the project directory, and go to the target directory under the project directory, use the LS command to view all the files as follows:


Where the Stormdemo-0.0.1-snapshot.jar file is the jar package we're going to commit to storm. Use the following command to commit the jar package.

Storm Jar Stormdemo-0.0.1-snapshot.jarstorm.topology.exclamationtopology Demo

Where the storm.topology.ExclamationTopology is the main entrance of the jar package, the following demo is the parameter, which we mentioned earlier, using this parameter to distinguish between local mode and remote mode.


4.3 Viewing submissions and execution results

Once submitted, you can see the corresponding execution on the Storm's web page.



5 further thinking

All along, basically all in C + + This line of struggle, with this time to the various sudden feelings, found more before the insight or too short-sighted. "Perfection" is a very ambitious goal, but it should be put away in this fast-developing iteration of the product. have been looking at a number of flawed products, will produce a more and more tired mentality it?

Try to understand another mentality, to "tracer bullets" start, rapid development, rapid iteration, gradually improve, to the perfect closer.

The times are changing, technology is breaking out, going to the future!


Real-Time Computing Framework II: an example of Storm's entry

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.