Storm, the core code is written using Clojure, the utility is developed using Python, and the Java development topology is used.
- The storm cluster surface resembles a Hadoop cluster. But on Hadoop you run "MapReduce Jobs" and you Run "topologies" on storm. "Jobs" and "topologies" are very different, a key difference is that a mapreduce job will eventually end, and a topology will always handle the message (or until you kill it).
- The storm cluster has two types of nodes: the control (master) node and the worker (worker) node.
- The control node runs a daemon called "Nimbus", which resembles the "Jobtracker" of Haddop. Nimbus is responsible for distributing code within the cluster, assigning tasks to workers, and monitoring failures.
- Each worker node runs a background program called "Supervisor". Supervisor listens to the work assigned to its machine and decides to start or stop worker processes based on what Nimbus assigns to it. Each worker process executes a subset of topology (that is, a sub-topological structure); A running topology consists of many worker processes that span multiple machines
- A zookeeper cluster is responsible for all coordination between Nimbus and multiple supervisor (a full topology may be divided into multiple sub-topologies and completed by multiple supervisor).
In addition, both the Nimbus daemon and the Supervisor daemon are fast-failing (fail-fast) and stateless, and all States are maintained on zookeeper or local disks. This means that you can kill-9 kill the Nimbus process and the supervisor process, and then reboot, and they will resume their status and continue to work as if nothing had happened. This design makes storm extremely stable. In this design, master does not communicate directly with the worker, but instead uses a mediation zookeeper, which separates master and worker dependencies and stores state information in the zookeeper cluster to quickly reply to any failed party.
Nimbus main node, generally only one, supervisor as from the node, can have multiple;
The Nimbus node receives the request, shards the submitted topology, divides the task into a task and submits the information related to the supervisor to the zookeeper cluster, and supervisor goes to the zookeeper cluster to pick up its own task. Notifies its worker process to perform task processing.
The main methods of spout are:
Open (Map conf,topologycontext context,spoutoutputcollector collector) Close () nexttuple () Ack (Object msgId) fail ( Object msgId)
Open (): Initialization method
Close (): spout will be called when it is closed, but it is not guaranteed to be called, because the Supervisor node in the cluster can use kill-9 to kill the worker process, only the storm is running in local mode, if it is a send Stop command, is to ensure that close executes.
Declareoutputfields Method:
Declares the field name of the tuple to be output.
void Ack (Object msgid)
The method of the callback when a tuple is successfully processed, typically the implementation of this method is to remove messages from Message Queuing to prevent re-sending.
void Fail (Object msgid)
A callback method that handles a tuple failure, and typically the implementation of this method is to put the message back in the message queue and then re-send it later in time.
Nexttuple ()
The storm framework calls this method all the time, and the output is stepless to Outputcollector. This method should be non-blocking. Nexttuple,ack and fail are called in the same thread of the spout task.
public void Nexttuple () {This.collector.emit (new Values (Sentences[index])); index++;if (index >=sentences.length) { index=0;} Utils.sleep (1);}
Typically, implement a spout, you can directly implement Irichspout, or directly inherit baserichspout, you can write a little bit less code.
Bolt
Prepare () This method is similar to the Setup method in open () or mapper/reducer in spout, called when task is initialized, which provides the execution environment for the bolt.
void Cleanup () is called before closing, and does not guarantee that it will be executed.
The Execute () method receives a tuple and processes it and uses the prepare method to pass in the Outputcollector Ack method or fail to feed back the result.
Implement Bolt, you can implement Irichbolt interface or inherit Baserichbolt, if you do not want to process results feedback, can implement Ibasebolt interface or inheritance Basebasicbolt, it actually automatically implemented Collector.emit.ack (Inputtuple).
spout and bolts