http://storm.apache.org/releases/1.0.1/Lifecycle-of-a-topology.htmlthe life cycle of a storm topology the content of this page is based on the 0.7.1 code, and then a lot of changes. explains in detail the life cycle of a topology: from the commit topology to the supervisor start-stop workersalso explains how Nimbus monitors how topologies and topologies are shut down when they are killed. The first is two instructions .
- The topology of the real-running topology is not the same as that of the user claims, and the real topology contains the implied stream and the implied Acker Bolt to manage the acking framework (which guarantees the processing of the data), an implicit topology that is implemented through the system-topology! function
- storm-topology! function is used in two places
- When Nimbus creates a task for the topology
- In the worker, so that it knows when to route information
Starting a topology
- Storm jar This command executes your class with specific parameters, and the only special thing this command does is set the STORM.JAR environment variable to prepare for Stormsubmitter later use
- When using Stromsubmitter.submittopology, Stormsubmitter will perform the following actions.
- First: If the jar is not previously uploaded, the jar will be uploaded first
- The Thrift Interface jar upload is completed via Nimbus.
- Beginfileupload returns the path of a Nimbus inbox
- 15kb once uploaded via Uploadchunk
- Finishfileupload is called when the upload is complete
- This is the way thrift is implemented Nimbus
- Second: Stormsubmitter call submittopology on Nimbus Thrift Interface
- The configuration of the topology is serialized by JSON
- Note that the thrift Submittopology call requires a Nimbus Inbox path (the path to the jar upload)
- Nimbus received the submission of the topology
- Nimbus regulates the configuration of the topology to make sure that each task has the same serialization registration, which is critical for the correct execution of the serialization effort.
- Nimbus setting a static state for a topology
- The jar packages and configurations are placed locally because these files are too large for zookeeper, and the files are copied to the {Nimbus local dir}/stormdist/{topology ID}
- Setup-storm-static Write task-component mapping to ZK
- Setup-heartbeats Create the ZK directory where tasks can be heartbeat
- Nimbus Call Mk-assignment to assign a task to the machine
- Tasks include:
- Master-code-dir: Used by Supervisor to download the correct jars/configs
- Task->node+port: A mapping from the task ID to the worker node that runs it
- Node->host: Mapping from Node ID to host allows workers to know which machine to connect to to communicate with other worker. The node ID is used for ID supervisor so that multiple supervisor can be run on a machine.
- Task->start-time-secs: A mapping that contains a task and a timestamp for the Nimbus start task. This is used by Nimbus to monitor the topology.
- Once the topology is assigned, they are initialized to a state that is not activated. Start-storm writes data to zookeeper so that cluster knows that the topology is activated and can launch tuples from spout.
- Todo Cluster status chart
- Supervisor running two functions in the background
- Synchronize-supervisor: This is called when the task in zookeeper is changed, and is usually called every 10 seconds.
- For machines without code, download code from Nimbus to them
- What the node should run: Port->localassignment writes to the file system. Where localassignment contains a topology ID and a list of task IDs that also contain workers
- Sync-processes: Reads from the local file system what the previous function writes, compared to what is actually running. Then the start-stop worker process is synchronized if necessary.
- Worker process starts with mk-worker function
- The worker connects to the other worker and initiates a thread to monitor the change. If a worker is reassigned, this worker will reconnect to the new state of the other worker.
- Monitoring is alive regardless of the topology, and the state is stored in the Storm-active-atom variable. This variable is used by the task to decide to call the Nexttuple method that does not call spout.
- Worker initiates multi-tasking with multithreading
- The task is set by the Mk-task function
- The task sets the path function, in which a stream is passed in and an output tuple is returned and a task list is sent to the tuple.
- Task Settings spout-specific or bolt-specific code
topology Monitoring
- The Nimbus monitors the state of the entire topology throughout his life cycle.
- Schedule recurring tasks to the timer thread to check the topology
- Nimbus behaves as a finite state machine.
- The monitoring time on the topology is called at each "Nimbus.monitor.freq.secs", which is called by Reassign-transition Reassign-topology
- Reassign-topology calls Mk-assignments, which is also used to allocate the topology for the first time. Mk-assignments also has the ability to increase the update topology.
- Mk-assignments Checking Heartbeat Assignment tasks
- Any redistribution that changes the state in ZK will trigger supervisor to sync, and start and stop workers.
killing a topology
- storm kill this command will invoke the Nimbus thrift interface to listen to a topology
- nimbus perform this kill over
- This kill-over function transforms the topology to killed state, and the event to convert remove is wait time seconds
-
- causes the topology to be stopped, the wait time in real shutdown gives the topology a chance to handle the things being handled before shutting down workers
- Changing the state in a kill transaction ensures that the kill protocol is tolerable for Nimbus crashes, and once started, If the state of the topology is Killed,nimbus, the scheduled removal event runs wait time seconds.
- remove topology, cleanup tasks and static information in ZK
- The separate cleanup thread runs the Do-cleanup function and will clean up the local heartbeat dir and Jars/config
Storm_0009_lifecycle-of-a-topology/life cycle of a topology