Learn Storm_storm basic architecture with me

Source: Internet
Author: User

A storm cluster is similar to a Hadoop cluster. However you run "MapReduce job" on Hadoop and you run "topologies" on storm. The "job" and "topologies" itself are very different, one of the key differences is that the work of MapReduce is finally done, while the topologies handles the message forever (or until you kill it). Strom clusters have two main types of nodes: the primary node and the working node. A daemon called "Nimbus" is run on the master node, which is "jobtracker" like Hadoop. Nimbus is responsible for distributing the code in the cluster, assigning tasks to other machines, and fault monitoring.

Each work node runs a daemon called "Supervisor". Supervisor listens to the machine assigned to it, and starts and shuts down the worker process as necessary, based on the Nimbus's delegation. Each worker process executes a subset of the topology. A running topology consists of a number of working processes running on many machines.

  

Figure 1 Storm architecture

All coordination between Nimbus and supervisors is done through a zookeeper cluster. In addition, the Nimbus daemon and the Supervisors daemon are unreachable and stateless, and all of the States remain in zookeeper or on a local disk. This means that you can kill-9 Nimbus or supervisors processes, so they do not need to be backed up. This design results in an incredibly stable storm cluster.

Storm implements a data flow model in which data continues to flow through a transformed entity network. An abstraction of a data stream is called a stream, which is an infinite array of tuples. tuples ( tuple is like a structure that uses some additional serialization code to represent standard data types (such as integers, floats, and byte arrays) or user-defined types. Each stream is defined by a unique ID that can be used to build the topology of the data source and sink (sink). The stream originates from the nozzle (spout), spout the data from an external source into the Storm topology.

  

Figure 2 The topology of Storm

The receiver (or the entity that provides the conversion) is called a bolt. The bolt implements a single transition on a stream and all processing in a Storm topology. Bolts can implement traditional functions like mapreduce or more complex operations (single-step functions), such as filtering, aggregating, or communicating with external entities such as databases. A typical Storm topology implements multiple transformations, so multiple bolts with independent tuple flows are required. Both bolts and spout are implemented as one or more tasks in a Linux system.

However, one of the most interesting features of the Storm architecture is guaranteed message handling. Storm can guarantee that each tuple emitted by a spout will be processed, and if it is not processed within the timeout period, Storm will re-emit the tuple from that spout. This feature requires some clever tricks to track elements in the topology, and is one of the important added values of storm.

In addition to supporting reliable messaging, Storm uses ØMQ (ZeroMQ) to maximize messaging performance (removing intermediate queues to enable direct delivery of messages between tasks). The ØMQ incorporates congestion detection and adjusts its communication to optimize the available bandwidth.

The first highlight of Storm 0.9.0.1 was the introduction of the Netty Transport. The storm network transmission mechanism realizes the pluggable form, currently contains two kinds of methods : The original ØMQ transmission, as well as the new Netty implementation; In earlier versions (prior to 0.9.x), Storm only supported ØMQ transmissions because ØMQ was a local library (native Library), a high dependency on the platform, it is still challenging to install properly. The differences between versions are also large; Netty transport provides a pure Java alternative that eliminates storm's local library dependencies and is more than one-fold faster than ØMQ's network transfer performance.

Learn Storm_storm basic architecture with me

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.