Writing these things is essentially a record of work contact Storm after the learning progress, since it is work, of course, to knock code, so this article on the sharing of basic Java-api bar.
First look at the diagram below (no excuse to paint), this is the Storm API use of the most basic interface and abstract class relationship.
OK, here we can see clearly, IComponent is the API core interface, then it is how the composition.
Public interface IComponent extends Serializable {
/**
* @param declarer This is used to declare output stream IDs , output fields, and whether or not each output stream is a direct stream
/void Declareoutputfields (Outputfields declarer declarer);
Map<string, object> getcomponentconfiguration ();
}
These two methods are very simple, declareoutputfields is to affirm the output mode of topology midstream (when the Stream mode is specified), and getcomponentconfiguration is to obtain Storm configuration information.
In fact, there are two basic interfaces in the Visio diagram I didn't draw, respectively, Ispout and Ibolt. Because we can understand that irichspout and Irichbolt are the combination of both and IComponent (inheritance). Then come the first, say spout:
void Open (Map conf, topologycontext context, spoutoutputcollector collector);
Call (not guaranteed to be called) void Close () when the spout terminates
;
Call
void Activate () when spout is activated;
Called when the spout is deactivated (possibly reactivated), the nexttuple
Void deactivate () is not invoked when this method is invoked;
void Nexttuple ();
void Ack (Object msgId);
void Fail (Object msgId);
The easiest way to understand this is to not mention it here (make a note) and say something more important.
Open first, when the spout initialization of the call, the total received three objects, a configuration object, a topology context object, there is an output controller object. It's important to mention the Spoutoutputcollector class, which controls the entire spout class on the tuple transmission, and focuses on the following methods:
list<integer> emit (String streamid, list<object> tuple, Object MessageID);
void Emitdirect (int taskId, String streamid, list<object> tuple, Object MessageID);
Long Getpendingcount ();
The first two methods are to commit tuple to the stream, except that the latter is directed commit. It can pass parameters
int taskId, String streamid, list<object> tuple, Object MessageID
The meaning of the first three parameters is the literal meaning (the normative of the name), and the MessageID is used for anchoring (after talking).
Then the Nexttuple method, which is a non-blocking method, means that when there is no tuple to emit, it is returned immediately (non-blocking). As if Storm is after the 0.8.1 version emit empty words nexttuple on the default sleep 1 seconds (configurable, Sleepspoutwaitstrategy interface), mainly for the rational allocation of CPU resources. In short, your topology is alive (except for some special cases), your nexttuple method is constantly being invoked, always asking for tuple, where we also call the Spoutoutputcollector object's emit method to send the data.
Finally said Ack, fail method, joint MessageID together. Said before, Storm spout nexttuple, ack, fail seems to be a thread, so the design is non-blocking mode, concrete bottom I also can't see, hey (said Jstorm is divided into multithreading). So can be based on the actual situation of nexttuple business thread out of the list. OK, back to the point, Ack method is Storm anchoring mechanism, to say simple words can this to speak: spout emit a tuple, if carried MessageID (Don't tell me you forget this thing), this tuple transmission process will be tracked, Until it is sent successfully or failed to invoke the Fail method. With respect to the Fail method, the default is to re-enter the queue after the tuple failure, and to send it again. Specific reconfiguration I have not studied, there are research friends can exchange, another Getpendingcount method I did not understand what role, understand the same friends welcome advice, Long live open source.
Spout finish then I said Bolt, the old look, first look at the source code
void Prepare (Map stormconf, Topologycontext context, outputcollector collector);
void execute (Tuple input);
Bolt at termination (not guaranteed to be called)
void Cleanup ();
With Spout,cleanup will not explain, here to say prepare and execute. First say Prepare method:
This is the initialization method of Bolt, three objects and spout is not the same as the Outputcollector:
list<integer> emit (String streamid, collection<tuple> anchors, list<object> Tuple);
void Emitdirect (int taskId, String streamid, collection<tuple> anchors, list<object> Tuple);
void ack (Tuple input);
void fail (Tuple input);
void Resettimeout (Tuple input);
In fact, also outputcollector just the ACK and fail methods included, more than a time-out reset configuration, usage and spoutoutputcollector basically the same.
Then the focus is on the Execute method, which is used as a logical approach, where you can get the tuple from spout and then do the business implementation you need in execute. Of course, if you still want to continue to transmit your tuple, then you have to invoke the Outputcollector object that you initialized in the Prepare method, emit your tuple (whether it's anchoring or whether the business is data-reliable).
Just found the leak said an important thing, Tuple, Hi, ah, fill up on:
Tuple This class, which contains the meta meta information you want to transmit, the content, and the method of operation, inherits from Ituple to decentralize some methods (too much)
Public Globalstreamid Getsourceglobalstreamid ();
Public String getsourcecomponent ();
public int getsourcetask ();
Public MessageID Getmessageid ();
/**
* To determine if tuple contains the named field
/public boolean contains (String field);
/**
* Returns tuples field (dynamic type)/public
Object getValue (int i) by positional parameter;
/**
* Returns the field (string type)/public
string getString (int i) of tuples by positional parameters;
/**
* The field (string type)/public
string Getstringbyfield (string field) that returns tuples by name;
Here is just a part of the Tuple method, many implementations are essentially the same, can return various contextual information, can be tuples by location and naming (specifically speaking of the Stream mode) to return the dynamic or known type of field, that is, you pass the actual data, by the way, the so-called Value is actually a class that encapsulates ArrayList.
public class values extends arraylist<object>{the public values
() {
} public
values (Object ... vals) {
super (vals.length);
for (Object o:vals) {
add (o);}}}
So spout and Bolt basic API interface analysis is here, then say a Bolt extension interface Ibasicbolt
Public interface Ibasicbolt extends IComponent {
void Prepare (Map stormconf, Topologycontext context);
void execute (Tuple input, basicoutputcollector collector);
void Cleanup ();
}
In fact, before you understand the friend here should be very easy to understand, Basicoutputcollector, this is the key
Public interface Ibasicoutputcollector extends ierrorreporter{
list<integer> emit (String Streamid, List <Object> tuple);
void Emitdirect (int taskId, String streamid, list<object> tuple);
void Resettimeout (Tuple Tuple);
}
Ibasicoutputcollector yourself to help you realize the ACK mechanism of emit, do not need to write your own, for some requirements of reliability and not complex business ibasicbolt very practical.
OK, here it is, abstract class here I will not say (nothing to say). In fact, spout and Bolt API There are some functional encapsulation, such as itransactionspout, kafkaspout, etc. (this project used), you can go to see the source, in fact, I said these methods plus their respective function points, At most, the logic is more complex and can be understood.