Real-time computing storm Process Architecture Summary

Source: Internet
Author: User

Hadoop is generally used for offline analysis and calculation, and Storm is distinguished from Hadoop, used in real-time streaming computing, is widely used in real-time log processing, real-time statistics, real-time wind control and other scenarios, of course, can also be used in real-time data processing, stored in a distributed database such as HBase, facilitate subsequent queries.

Faced with the real-time computation of large quantities of data, storm implemented a scalable, low-latency, reliable and fault-tolerant distributed computing platform.

1. Introduction of objects

Tuple: Represents a basic processing unit in a stream, can include multiple fields, each filed represents an attribute

Topology: A topology is a graph of compute nodes, each node shifting the logic of processing, the connection between the nodes represents the direction of data flow

Spout: Represents the source of a stream, producing a tuple

BOLT: processing the input stream and generating multiple output streams, you can do simple data conversion calculations, complex flow processing typically requires more than one bolt to process

Nimnus: Master node, responsible for publishing code in the cluster, assigning work to the machine, and monitoring the status

Supervisor: A machine, a working node, listens to the assigned work and starts and shuts down the worker process as needed.

Woker: Executes the topology worker process for generating a task

Task: Each spout and bolt can be run as a task in storm, a task corresponding to a thread

The composition of storm topology topology is shown in

2. Overall architecture

The client submits the topology to Nimbus.

Nimbus The local directory for the topology calculates the task according to the topology configuration, assigns the task, establishes the assignments node on the zookeeper the supervisor correspondence between the task and the Woker machine node;

Create a Taskbeats node on zookeeper to monitor the heartbeat of a task; start topology.

Supervisor go to Zookeeper to get the assigned tasks, start multiple woker, each woker generate a task, a task one thread, initialize the connection between tasks based on the topology information; Between the task and task is managed through ZEROMQ, and then the entire topology runs.

Real-time computing storm Process Architecture Summary

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.