Storm principles and conceptual explanations

Source: Internet
Author: User

Storm's cluster structure

Comparison of Storm and Hadoop architectures

Primary node Job node job

Storm Numbus Supervisor toplogies (topology), Dead loop

Hadoop jobtracker tasktracker MapReduce Job, execution completed immediately end

Architecture diagram

All coordination between Nimbus and supervisors is done through a zookeeper cluster.

Nimbus processes and supervisors processes are not directly connected and stateless; All the statuses are maintained in zookeeper or stored on the local disk.

This means that you can kill-9 the Nimbus or supervisors process without having to do a backup.

This design results in a storm cluster with incredible stability, i.e. no coupling.

How Storm Works

Nimbus is responsible for distributing the code in the cluster, topo can only be submitted on the Nimbus machine, assigning tasks to other machines, and fault monitoring.

Supervisor, listens to the node assigned to it, and starts and shuts down the worker process as necessary, based on the Nimbus's delegation. Each worker process executes a subset of the topology. A running topology consists of a number of working processes running on many machines.

In storm there is an abstraction of stream stream, a continuous, unbounded contiguous tuple, with the attention of storm in modeling event flow, abstracting events in the stream as tuple-tuples

Storm thinks that each stream has a source, the source of the primitive tuple, called spout (pipe port).


Processing a tuple within a stream, abstracted as Bolt,bolt can consume any number of input streams, as long as the flow direction is directed to the bolt, and it can also send a new stream to other bolts to use, in this way, As long as the specific spout is opened and the tuple flowing out of the spout is directed to a specific bolt, the bolt handles the imported stream and then directs other bolts or destinations.

It can be thought that spout is the faucet, and the water flowing out of each faucet is different, we want to get what kind of water to unscrew which faucet, and then use the pipe to guide the water faucet to a water treatment device (bolt), the water processor and then use the pipe to guide another processor or into the container.

In order to increase the efficiency of water treatment, it is natural to think of the same water source to connect multiple taps and use multiple water processors, which can improve efficiency.

A direction-free graph, which is abstracted by storm as Topology (topology), and Topo is the job abstraction of Storm, a topology is a stream transformation diagram

Each node in the diagram is a spout or bolt, each spout or bolt sends a tuple to the next level of component, broadcast mode.

and spout to a single bolt there are 6 ways of grouping.

Topology

Storm abstracts the elements of the stream into tuples, a tuple is a list of values, and each value in value list,list has a name, and the value can be any serializable type. Each node of the topology describes the name of the field of the tuple it emits, and the other node only needs to subscribe to that name to receive processing.

Concept

Streams: Message Flow

A message flow is a tuple sequence without boundaries, and these tuples are created and processed in parallel in a distributed manner. Each tuple can contain multiple columns, the field type can be: integer, long, Short, Byte, string, double, float, Boolean, and byte array. You can also customize the type-as long as you implement the corresponding serializer.

spouts: Message Source

Spouts is a producer of topology messages. Spout reads data from an external source (message queue) and sends a tuple to the topology. The message source spouts can be either reliable or unreliable. A reliable message source can re-emit a processing failed tuple, an unreliable message source spouts not.

The method of the spout class Nexttuple continuously launches a tuple to topology,storm when it detects that a tuple is successfully processed by the entire topology, otherwise calls fail.

Storm only calls ACK and fail for reliable spout.

Bolts: Message Processor

Message processing logic is encapsulated in bolts, bolts can do a lot of things: filtering, aggregation, querying the database and so on.

Bolts can simply do message flow delivery. Complex message flow processing often takes a lot of steps, and therefore requires a lot of bolts. The output of the first level bolt can be used as input to the next level of bolts. and spout cannot have a level.

The main method of bolts is that execute (the dead loop) processes the incoming tuple serially, successfully processing the ACK method of each tuple call Outputcollector to notify the storm that the tuple is processed. When processing fails, the Fail method can be called to notify the spout side that the tuple can be resent.

The process is: bolts processes an input tuple and then calls ACK to notify Storm that he has already processed the tuple. Storm provides a ibasicbolt that automatically calls ACK.

Bolts uses Outputcollector to launch a tuple to the next level of blot.

Storm principles and conceptual explanations

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.