Storm is an open source framework for Twitter. Storm a distributed, fault-tolerant real-time computing system. Official website:http://storm.apache.org/
The Twitter Storm cluster is ostensibly similar to a Hadoop cluster, with MapReduce Jobs running on Hadoop, and storm running topologies, but there's a big difference in itself, the main difference being that Hadoop The MapReduce job run will eventually end, and storm topologies processes the data process in theory for permanent survival unless you kill it.
1. The storm cluster contains two types of nodes: the master node and the work node. The respective roles are as follows:
1).
Nimbus (Master Node)Responsible for distributing the code within the storm cluster, assigning tasks to the working machine, and monitoring the running state of the cluster; the Nimbus function is similar to Jobtracker in Hadoop. 2).
SupervisorThe work Node Supervisor is responsible for monitoring the tasks assigned to it from Nimbus, starting or stopping the worker process (worker) that performs the task. 3).
Nimbus and SupervisorAll coordination between nodes is achieved through the zookeeper cluster.
- Both the Nimbus and supervisor processes are fast failures (fail-fast) and stateless (stateless);
- All the state of the storm cluster is either in the Zookeeper cluster or on a local disk.
This means that you can use kill-9 to kill Nimbus and supervisor processes, which can continue to work after a reboot. This design allows the storm cluster to have incredible stability.2. To implement real-time computing on a storm cluster, you need to create a topologiesRunning a topology is simple, first, you package all the code and dependencies of the package into a jar package. Then, run the following command:
1 Storm jar all-my-2 3// This runs a backtype.storm.MyTopology class that contains Arg1 and arg2 two parameters, the main method defines topology and commits to the Nimbus,storm Jar section connection Nimbus and uploads the jar package to the cluster. 4 5 Kill {Stormname} // To kill a topology
The main features of Storm are as follows:
- a simple programming model. similar to mapreduce reduces the complexity of parallel batching, storm reduces the complexity of real-time processing.
- You can use a variety of programming languages . You can use a variety of programming languages on top of storm. Clojure, Java, Ruby, and Python are supported by default. To increase support for other languages, simply implement a simple storm communication protocol.
- fault tolerance . Storm manages the failure of worker processes and nodes.
- Horizontal Expansion . Calculations are performed in parallel between multiple threads, processes, and servers.
- Reliable message handling. storm guarantees that each message can be processed at least once. When a task fails, it is responsible for retrying the message from the message source.
- fast. the design of the system ensures that the message can be processed quickly, using ZEROMQ as its underlying message queue.
- Local mode . Storm has a "local mode" that can fully simulate storm clusters during processing. This allows you to quickly develop and unit test.
Storm's terminology:
Stream, Spout, Bolt, Task, Worker, stream grouping, and topology
- stream is the processed data.
- sprout is the data source.
- bolt process data.
- task is the that runs in spout or bolts.
- worker is the that runs these threads.
- stream grouping specifies what the bolt receives as input data.
-
-
worker
- Supervisor will listen to the work assigned to it, start/close the worker process as needed, the worker
- Each worker consumes a port on the work node, which can be configured in the Storm.yarm
- A topology may be executed in one or more worker processes, and each worker process executes part of the entire topology, so a running topology consists of many working processes running on many machines.
Task
- Each spout and Bolt is executed as a lot of tasks throughout the cluster.
- By default, each task corresponds to a thread (Executor), which is used to execute the task,
- Stream grouping is the definition of how to emit a tuple from a bunch of tasks to another heap of tasks.
1. Introduction to Storm