Open source distributed real-time computing engine iveely Computing WordCount detailed (3)

Last Update:2015-10-09 Source: Internet

Author: User

Tags emit

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

WordCount is the most commonly used example of distributed computing, such as Hadoop, storm,iveely computing, and so on. Understand the WordCount on the iveely computing on the operating principle, it is easy to write a new distributed program. I already know how to deploy iveely computing and submit tasks in the previous article, and now we'll dive into WordCount code.

First, the code structure

Figure 3-1

As you can see from figure 3-1, there are two subclasses wordinput, Wordoutput, and one main method in class WordCount, Wordcount.java is a topology, which contains at least one input and output (integral, otherwise meaningless), as well as the main function, the main function is still the entry function of topology.

Now the question is, what does input and output really matter? and topology?

Each topology is a complete chain of tasks, which can contain multiple input, and multiple output,input data can only be passed to one or more output,output only to one or more output, thus forming a complete topological structure.

Second, Input in-depth

Input is the source of the data, and it is wordinput to see how the data is generated and passed to output.

 Public Static classWordinputextendsIinput {/*** Output data to collector. */        PrivateStreamchannel _channel; /*** All sample words. */        Private Finalstring[] _words =NewString[] {"Welcome", "iveely", "Computing", "0.9.0", "Build", "by",                "Liufanping", "Thanks", "github.com" }; Private int_index; @Override Public voidStart (hashmap<string, object>conf, Streamchannel Channel) {            //here,must be initialize channel._channel =Channel; _index= _words.length-1; } @Override Public voiddeclareoutputfields (Fieldsdeclarer declarer) {Declarer.declare (NewString[] {"word"},NewInteger[] {0 }); } @Override Public voidnexttuple () {if(_index < 0) {_channel.emitend (); } Else {                 for(inti = 0; I < 100; i++) {_channel.emit (_words[_index]); } _index--; }} @Override Public voidEnd () {System.out.println (GetName ()+ "finished."); } @Override Public voidTooutput () {_channel.addoutputto (Newwordoutput ()); }    }

Function Explanation:

Start function	Before executing this input, a function called in advance, user initialization, and other related work, similar to the constructor, must initialize the channel when there is data output.
Declareoutputfields function	The data information used to declare the output.
Nexttuple function	This function will be called frequently to output data and to use Channel.emit to submit data to an output.
End Function	Is the code that executes after input executes, similar to a destructor.
Tooutput function	is the output to which the data for input is specified.

There are several issues to note in the above code:

2.1 Wordinput must inherit iinput.

2.2 Input, the channel must be initialized in start because input is bound to produce data.

2.3 Input, the data flow must be specified in the Tooutput function.

Third, output in depth

Output is the processing unit of the data, or it can be the generating unit of the new data.

 Public Static classWordoutputextendsIoutput {PrivateTreemap<string, integer>_map; @Override Public voidStart (hashmap<string, object>conf, Streamchannel Channel) {_map=NewTreemap<>(); } @Override Public voiddeclareoutputfields (Fieldsdeclarer declarer) {Declarer.declare (NewString[] {"word", "totalcount"},NULL); } @Override Public voidExecute (tuple tuple) {String word= Tuple.get (0). toString (); if(_map.containskey (Word)) {intCurrentcount =_map.get (word); _map.put (Word, Currentcount+ 1); } Else{_map.put (Word,1); }} @Override Public voidEnd () {//Output map to database or print.Iterator<string> it =_map.keyset (). iterator ();  while(It.hasnext ()) {String key=It.next (); intValue =_map.get (key); System.out.println (GetName ()+ ":" + key + "," +value); }} @Override Public voidtooutput () {}}

There is no nexttuple function in output compared to input, instead of the Execute function. Nexttuple is generating data, and execute is processing data. If the data after execute processing is also required to be submitted to the new output, you need to submit the data in execute using the Channel.emit method, and you need to specify the data flow in the Tooutput.

There are several issues to note here:

3.1 If output needs to continue to pass data, you need to initialize the channel in start.

3.2 If the current output accepts a data source from a different input, and the data format is not uniform, you need to determine the data format yourself, such as passing the array, the first one to identify with an int is the data format.

Four, main function

The main function, which is still the execution entry for topology, differs in that it has two modes of execution, one local and one remote. Local mode is used to tune the trial.

 Public Static void Main (string[] args) {        new Topologybuilder (true, WordCount.  Class. GetName (), "WordCount");        Builder.setinput (new wordinput (), 1);        Builder.setoutput (new wordoutput (), 4);        Builder.setslave (2);        Topologysubmitter.submit (builder, args);    }

Main function, mainly do the work.

4.1 Creates a new Topologybuilder object, and in the first parameter of the constructor specifies whether the current local mode (TRUE) or remote mode (FALSE), the second parameter, specifies the name of the class to execute, the third argument, and the current topology.

4.2 Set input and output. and specifies the number of runs (threads).

4.3 Specifies the number of nodes to run (process).

4.4 Submit a task with Topologysubmitter.

4.5 Note: Be sure to change the first parameter of Topologybuilder to remote mode (false) when generating a jar submission to the server.

Open source distributed real-time computing engine iveely Computing WordCount detailed (3)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More