Instance demand analysis of = = =
The data source will continuously produce a huge amount of English sentences.
We need to get real-time word frequency, or topn, to see how the frequency is changing.
Imagine that this is the user behavior of different products operation data, we can be in real time to observe the user focus on the product hot?
= = = vs. Hadoop
====storm programming Model
Please refer to the section in the following article for details. This is only a brief introduction.
(1) Message source spout Component
is the access component of the message source . Usually we have two ways to implement this component:
①, inheriting Baserichspout class:
Relatively simple method, and need to rewrite the method is also relatively small.
Therefore, this is a relatively simple way, but can meet the basic business needs.
②, implement Irichspout interface:
This approach requires the implementation of a number of well-defined methods, and some of the methods overlap with baserichspout.
This approach is able to meet the needs of more complex business data access.
====worldcount Scheme and topology design
In order not to complicate the problem, we live in memory in the data source.
"Message Source (randomsentencespout)"
Send the built-in English statement as a message source in spout.
"Data Normalization (Wordnormalizerbolt)"
Then, using a bolt for normalization (statement slicing), the sentence is cut into words and emitted.
"Word frequency statistics (Wordcountbolt)"
Using a bolt to accept the word tuple of the subscription segmentation, the word statistics, and choose to use a group by the field of the strategy, the frequency of real-time sorting, the TOPN real-time launch out.
"Tool Class (Printbolt)"
Finally, use a brother bolt to print the results to log.
====wordcount Instance Code
Can be obtained from the following git, copyright belongs to: Geek College.
Https://github.com/blogchong/storm-example
= = = Related Class diagram
(Refer to Original posts: http://blog.csdn.net/xeseo/article/details/17750379)
In order to understand the arrangement of ideas, I will be the original post in the records of some finishing processing.
"IComponent Interface"
Both spout and bolts are their component. So, Storm defines a total interface called IComponent.
The inheritance relationship for IComponent is as follows:
The green section is our most commonly used and relatively simple part. The red part is related to the transaction.
Basecomponent is the "lazy" class that storm provides. Why do you say that, it and its subclasses are more or less implementing some of the methods of their interface definitions.
This allows us to inherit the class directly, rather than writing all the methods each time.
It is worth mentioning, however, that the Basexxx class, which implements the method, is empty and returns null directly.
"Spout"
The class diagram looks like this:
The interface looks like this:
Description of each interface:
①, open Method:
is the initialization action. Allows you to do some action during the initialization of the spout, passing in the context, to make it easier to take some data from the context.
②, close Method
Executed before the spout is closed, but cannot be guaranteed to be executed.
Spout is run as a task within the worker, and in cluster mode, the supervisor will kill-9 the Woker process directly so it cannot be executed.
in local mode, if the Send Stop command is not kill-9, close is guaranteed to execute.
③, activate and the Deactivate Method :
a spout can be temporarily activated and closed, and these two methods are called at the corresponding time.
④, Nexttuple Method:
Responsible for message access, execution of data transmission. Is the most important method in spout.
⑤, ACK (Object) method:
The object passed in is actually an ID that uniquely represents a tuple. The method is executed after the tuple that corresponds to this ID is successfully processed.
⑥, Fail (Object) method:
The same ACK is only executed when the tuple processing fails.
Our randomspout inherited the Baserichspout,
So you don't have to implement the close, activate, deactivate, ACK, fail, and Getcomponentconfiguration methods, only the most basic parts of the core.
Conclusion:
Usually (except for shell and transactional type), implement a spout, can directly implement interface Irichspout, if you do not want to write redundant code, you can directly inherit baserichspout.
"Bolt"
The class diagram looks like this:
Here's a curious question: Why didn't ibasicbolt inherit Ibolt? We looked down with the question.
The Ibolt defines three methods:
①, Prepare method:
Ibolt inherits Java.io.Serializable, and after we have submitted topology on Nimbus, the created bolts are serialized and sent to the specific worker.
When the worker executes the bolt, the prepare method is called first to pass in the current execution context.
②, Execute Method:
Accepts a tuple for processing and feeds back processing results with the Ack method of the Outputcollector passed in by the Prepare method (indicating success) or fail (indicating failure).
③, Cleanup method:
The Close method with Ispout, which is called before closing. It is also not guaranteed that it will be enforced.
The Red section (Execute method) is the place to be aware of when the bolt is implemented.
Storm, which provides the Ibasicbolt interface, is designed to implement the bolt of the interface without providing feedback in the code, and the internal storm automatically feeds back the success.
If you do want feedback to fail, you can throw failedexception.
Let's write another bolt to inherit Baserichbolt instead of Exclaimbasicbolt. The code is as follows: Modified topology run, the result is consistent.
Conclusion:Usually, implement a bolt, you can implement the Irichbolt interface or inherit Baserichbolt, if you do not want to handle the result feedback, you can implement Ibasicbolt interface or inherit Basebasicbolt, It is actually equivalent to automatically doing the Prepare method and Collector.emit.ack (inputtuple);
--end--
HelloWorld instances of Storm