What Storm is:
If you only use a word to describe storm, it could be this: distributed real-time computing systems. Storm's sense of real-time computing, according to storm authors, is similar to the meaning of Hadoop for batching. We all know that Hadoop, based on Google MapReduce, provides us with a map, the reduce primitive, which makes our batch process very simple and graceful. Storm is a real-time, distributed, and highly fault-tolerant computing system compared to the batch processing of Hadoop. Like Hadoop, Storm can handle large volumes of data, but storm can make processing more real-time with high reliability, which means that all information is processed. Storm can scale to large batches of data on different machines, and he has other features as well.
Storm's architecture:
Storm's cluster consists of a master node and multiple working nodes. The master node runs a daemon called "Nimbus", which is used to assign code, lay out tasks, and detect faults. Each work node runs a daemon called "Supervisor", which listens for work, starts and terminates the worker process. Both Nimbus and supervisor can fail quickly and are stateless, so they become very robust, and the coordination of the two works is done by zookeeper. The zookeeper is used to manage different components in the cluster. ZEROMQ is the internal messaging system, and JZMQ is the Java Binding for ZEROMQ. There is a subproject named Storm-deploy that can deploy a storm cluster on AWS with a single key.
Storm Advantage:
A. A simple programming model. Similar to mapreduce reduces the complexity of parallel batching, storm reduces the complexity of real-time processing.
B. Service, a service framework that supports hot deployment, instant on-line or offline apps.
C. You can use a variety of programming languages. You can use a variety of programming languages on top of storm. Clojure, Java, Ruby, and Python are supported by default. To increase support for other languages, you only need to implement a simple storm communication protocol.
D. Fault tolerance. Storm manages worker processes and node failures.
E. Horizontal expansion. Calculations are performed in parallel between multiple threads, processes, and servers.
F. Reliable message processing. Storm guarantees that each message can be processed at least once. When a task fails, it retries the message from the message source.
G. Fast. The design of the system ensures that the message can be processed quickly, using ZEROMQ as its underlying message queue.
H. Local mode. Storm has a "local mode" that can fully simulate storm clusters during processing. This allows you to quickly develop and unit test.
The problem with Storm:
A, the current open source version is only a single node Nimbus, hanging off can only be automatically restarted, you can consider the implementation of a dual Nimbus layout.
B, Clojure is a dynamic functional programming language running on the JVM platform, the advantage lies in the process calculation, the part of Storm's core content is written by Clojure, although the performance is improved a lot but also improve the maintenance cost.
Storm's application scenario:
Stream data processing. Storm can be used to handle incoming messages and write the results to a store after processing.
Distributed RPC. Because storm's processing components are distributed and processing latencies are extremely low, they can be used as a common distributed RPC framework. Of course, in fact, our search engine itself is a distributed RPC system.