Learn Storm_storm basic architecture with me

Last Update:2015-09-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

A storm cluster is similar to a Hadoop cluster. However you run "MapReduce job" on Hadoop and you run "topologies" on storm. The "job" and "topologies" itself are very different, one of the key differences is that the work of MapReduce is finally done, while the topologies handles the message forever (or until you kill it). Strom clusters have two main types of nodes: the primary node and the working node. A daemon called "Nimbus" is run on the master node, which is "jobtracker" like Hadoop. Nimbus is responsible for distributing the code in the cluster, assigning tasks to other machines, and fault monitoring.

Each work node runs a daemon called "Supervisor". Supervisor listens to the machine assigned to it, and starts and shuts down the worker process as necessary, based on the Nimbus's delegation. Each worker process executes a subset of the topology. A running topology consists of a number of working processes running on many machines.

Figure 1 Storm architecture

All coordination between Nimbus and supervisors is done through a zookeeper cluster. In addition, the Nimbus daemon and the Supervisors daemon are unreachable and stateless, and all of the States remain in zookeeper or on a local disk. This means that you can kill-9 Nimbus or supervisors processes, so they do not need to be backed up. This design results in an incredibly stable storm cluster.

Storm implements a data flow model in which data continues to flow through a transformed entity network. An abstraction of a data stream is called a stream, which is an infinite array of tuples. tuples ( tuple is like a structure that uses some additional serialization code to represent standard data types (such as integers, floats, and byte arrays) or user-defined types. Each stream is defined by a unique ID that can be used to build the topology of the data source and sink (sink). The stream originates from the nozzle (spout), spout the data from an external source into the Storm topology.

Figure 2 The topology of Storm

The receiver (or the entity that provides the conversion) is called a bolt. The bolt implements a single transition on a stream and all processing in a Storm topology. Bolts can implement traditional functions like mapreduce or more complex operations (single-step functions), such as filtering, aggregating, or communicating with external entities such as databases. A typical Storm topology implements multiple transformations, so multiple bolts with independent tuple flows are required. Both bolts and spout are implemented as one or more tasks in a Linux system.

However, one of the most interesting features of the Storm architecture is guaranteed message handling. Storm can guarantee that each tuple emitted by a spout will be processed, and if it is not processed within the timeout period, Storm will re-emit the tuple from that spout. This feature requires some clever tricks to track elements in the topology, and is one of the important added values of storm.

In addition to supporting reliable messaging, Storm uses ØMQ (ZeroMQ) to maximize messaging performance (removing intermediate queues to enable direct delivery of messages between tasks). The ØMQ incorporates congestion detection and adjusts its communication to optimize the available bandwidth.

The first highlight of Storm 0.9.0.1 was the introduction of the Netty Transport. The storm network transmission mechanism realizes the pluggable form, currently contains two kinds of methods : The original ØMQ transmission, as well as the new Netty implementation; In earlier versions (prior to 0.9.x), Storm only supported ØMQ transmissions because ØMQ was a local library (native Library), a high dependency on the platform, it is still challenging to install properly. The differences between versions are also large; Netty transport provides a pure Java alternative that eliminates storm's local library dependencies and is more than one-fold faster than ØMQ's network transfer performance.

Learn Storm_storm basic architecture with me

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Learn Storm_storm basic architecture with me

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Learn Storm_storm basic architecture with me

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support