In this article, let's take a look at how tuple in Storm goes from one tuple to another.
Bolt first calls the emit or emitDirect method of OutputCollector when launching a tuple,
The two methods ultimately call the mk-transfer-fn method in the clojure code:
| 123456 |
; worker.clj(defn mk-transfer-fn [transfer-queue](fn [task ^Tuple tuple](.put ^LinkedBlockingQueuetransfer-queue [task tuple]))) |
In fact, this method only adds a new record (task-id, tuple) to a LinkedBlockingQueue)
The contents in this queue will be processed by the following code:
| 010203040506070809101112131415161718192021222324252627 |
; worker.clj; What is this socket?(async-loop(fn [^ArrayList drainer^KryoTupleSerializer serializer]; Extract a task from transfer-queueThis task is actually (task, tuple)(let [felem (.take transfer-queue)](.add drainer felem)(.drainTo transfer-queue drainer))(read-locked endpoint-socket-lock; Get the node ing from node + port to socket(let [node+port->socket @node+port->socket; Get the task ing from task-id to node + porttask->node+port @task->node+port](doseq [[task ^Tuple tuple] drainer]; Obtain the socket corresponding to the task(let [socket(node+port->socket(task->node+port task)); Serialize this tupleser-tuple (.serialize serializer tuple)]; Send this tuple(msg/send socket task ser-tuple)))))) |
As shown in the code above, tuple is finally sent to the specified task by the msg/send method through socket after being serialized. Noteasync-loopCreates a separate thread to execute the code. Storm will initiate an independent thread to specifically send the message to be sent.
Let's take a look at what this socket is like. This socket is initialized in worker. clj. Check the following code:
| 01020304050607080910111213 |
; socket(worker.clj)(swap! node+port->socketmerge(into {}(dofor[[node port :as endpoint] new-connections][endpoint(msg/connectmq-context((:node->host assignment) node)port)]))) |
The code above shows that the socket is actually created by msg/connect. So what is msg/connect doing? This method is defined in protocol. clj:
| 123456 |
(defprotocol Context(bind [context virtual-port])(connect [context host port])(send-local-task-empty [context virtual-port])(term [context])) |
The definition is just an interface. The specific implementation is in zmq. clj. Zmq is short for ZeroMQ. It can be seen that the supervisor of storm uses zeromq to transmit tuple.
ZMQCOntext in zmq. clj implements the Context interface:
| 01020304050607080910111213141516171819202122232425262728293031323334 |
(deftype ZMQContext [context linger-ms ipc?]; Implement the Context InterfaceContext; Pull messages from the given virtual-port(bind [this virtual-port](-> context(mq/socket mq/pull)(mqvp/virtual-bind virtual-port)(ZMQConnection.))); Push messages to the specified host and port)(connect [this host port](let [url (if ipc?(str "ipc://" port "ipc")(str "tcp://" host ":" port))](-> context(mq/socket mq/push)(mq/set-linger linger-ms)(mq/connect url)(ZMQConnection.)))); Send an empty message to the local virtual-port(send-local-task-empty [this virtual-port](let [pusher(-> context(mq/socket mq/push)(mqvp/virtual-connect virtual-port))](mq/send pusher (mq/barr))(.close pusher)))(term [this](.term context)); Implement the ZMQContextQuery InterfaceZMQContextQuery(zmq-context [this]context)) |
Summarize the tuple processing and creation processes of Twitter Storm:
- Bolt creates a tuple.
- The Worker groups the tuple and the task-id of the tuple to be sent into a queue (queue blockingqueue ).
- A separate thread (the thread created by async-loop) will fetch each tuple in the sending queue for processing.
- Worker creates a zeromq connection from the current task to the target task.
- Serialize the tuple and send the tuple through the zeromq connection.
Recommended reading:
Twitter Storm installation configuration (cluster) Notes
Install a Twitter Storm Cluster
Notes on installing and configuring Twitter Storm (standalone version)
Storm practice and Example 1