Analysis of the internal principles of Apache storm

Source: Internet
Author: User
Tags ack documentation emit shuffle zookeeper port number

This article turns from [Shiyan personal blog]

This article is a personal to Storm application and learning a summary, because do not understand clojure language, so can not be more from the source analysis, but reference to the official website, many friends of the article, as well as "Storm applied:strategies for real-time event Processing's book, combined with his experience of using storm, hopes to be helpful to a friend who wants to dig deeper into the principles of storm, and has a lack of welcome to make brick exchanges.

Storm cluster architecture

Storm Cluster uses the master-slave architecture, the main node is Nimbus, from the node is the supervisor, about scheduling related information stored in the Zookeeper cluster, the architecture as shown in the following diagram:

Specifically described as follows: Nimbus

The master node of the storm cluster is responsible for distributing user code, assigning to the worker node on the specific Supervisor node, and running the task topology the corresponding component (Spout/bolt). Supervisor

The slave node of a storm cluster that manages the start and end of each worker process running on the Supervisor node. Through the Supervisor.slots.ports configuration item in the storm's configuration file, you can specify the maximum number of slots allowed on a single supervisor, each slot is uniquely identified by the port number, and a port number corresponds to a worker process (if the worker process is started). ZooKeeper

Used to coordinate Nimbus and supervisor, if supervisor is unable to run due to a failure problem, Topology,nimbus is first perceived and reassigned topology to other available supervisor to run.

Stream groupings

The most important abstraction in storm is the stream grouping, which controls how the Spot/bolt corresponding task distributes the tuple and launches the tuple to the target Spot/bolt corresponding task, as shown in the following figure:

Currently, Storm streaming Grouping supports several types: Shuffle Grouping: Random grouping, a task that spans multiple bolts, which randomly causes each bolt's task to receive roughly the same number of tuples, But a tuple does not repeat fields Grouping: grouping According to the specified field, the value of the same field must be emitted to the same task. Partial Key Grouping: Similar to the fields Grouping, Group distribution according to a part of the specified field, it is well implemented load balance, sending a tuple to the corresponding task of the downstream bolt, especially in the case of data skew, using Partial Key grouping can better improve resource utilization All Grouping: The task of all bolts receives the same tuple (meaning here) Global Grouping: All flows point to the same task of a bolt (that is, the task ID is the smallest) None Grouping: There is no need to care about how streams are grouped, equivalent to shuffle Grouping Direct Grouping: By the producer of the Tupe to decide which Bolt is sent to the downstream task, This is precisely controlled in the logic of the actual development of the bolt code. Local or Shuffle Grouping: If the target bolt has 1 or more tasks in the JVM instance corresponding to the same worker process, the tuple is sent only to those tasks

In addition, Storm also provides user-defined streaming grouping interface, if the above streaming grouping can not meet the actual business needs, but also to achieve their own, Just implement the Backtype.storm.grouping.CustomStreamGrouping interface, which defines the following methods:

list<integer> choosetasks (int taskId, list<object> values)

The above several streaming group's built-in implementations, the most commonly used should be shuffle Grouping, fields Grouping, Direct Grouping These three kinds, use other can also meet the specific application needs.

Acker principle

First, we understand the concept of a tuple tree, and to calculate the number of occurrences of each letter in an English sentence, the resulting tuple tree is shown in the following figure:

For this example, in other words, every English sentence in the runtime corresponds to a tuple tree, a tuple tree may be large or small and relevant to the specific business requirements.
In addition, Acker is also a bolt component, but when we implement our business logic, we do not need to care about the implementation of Acker Bolt, after committing the implementation of topology to the Storm cluster, The system automatically adds Acker to our topology when initializing topology, and its main function is to keep track of the relationships between the tuples processed by each spout/bolt in our own implementation topology (or, to be said, Tracks the processing progress of a tuple tree).
Below, we describe the mechanism of Acker as follows: When a task of spout creates a tuple, that is, the processing logic that implements reading data from a particular data source in the Nexttuple () method of spout, communicates with Acker, sends a message to Acker, Acker Save the tuple counterpart information: {: Spout-task task-id:val ack-val)} When the bolt is emit a new sub-tuple, the relationship between the child tuple and the parent tuple is saved when the ACK is made in the bolt. The parent tuple is computed with an XOR value for all the child tuples that are newly generated by the parent tuple, and the value is sent to Acker (computes the XOR value: Tuple-id ^ (child-tuple-id1 ^ Child-tuple-id2 ... ^ CHILD-TUPLE-IDN)). As can be seen, here Bolt does not send all the generated sub-tuple to Acker, which is much larger than sending an XOR value, only sending an XOR value greatly reduces the overhead of network traffic between Bolt and Acker Acker receives the XOR value of the bolt sent, The initial ack-val corresponding to the currently saved Task-id is the same, Tuple-id is the same as Ack-val, the XOR result is 0, but the Child-tuple-id of the sub-tuple is not the same as each other. Only wait for all the Child-tuple-id of the sub-tuple to perform an ACK back, the last Ack-val is 0, indicating that the entire tuple tree is processed successfully. Both success and failure are eventually removed from the queue maintained by Acker. Finally, Acker sends a notification, success or failure, to the spout Ack or fail method to the spout corresponding task that generated the original parent tuple. If we rewrite the ACK and fail methods when implementing spout, processing the callback executes the logic here.

Storm Design: Component abstraction

After we have written the topology of the business logic to the storm cluster, the scheduling of tasks and the allocation of resources will occur, which will also result in a variety of components based on storm design. Let's take a look at the runtime deployment map that topology submits to the storm cluster, as shown in the following figure:

As we can see from the above figure, a topology spout/ Multiple tasks for bolts may be distributed within multiple workers of multiple supervisor. There are multiple executor within each worker and are calculated and assigned at run time based on the actual configuration of the topology.
from the Supervisor node running topology to the final task runtime object, we probably need to understand some of the concepts of storm abstraction, because it's relatively easy, I'll simply explain: topology:storm abstraction of a distributed computing application , the goal is to complete one thing (from a business point of view) through a realization topology. A topology is made up of a set of static program components (Spout/bolt), component relationships streaming groups these two parts. Spout: Describes how data is entered into a storm cluster from an external system (or directly from within a component) and handled by the topology that the Spout belongs to, usually reading from a data source or doing some simple processing (in order not to affect the data in a continuous, real-time, Fast access to the system, it is not usually recommended to place complex processing logic here. Bolt: Describes the business-related processing logic.

Above are some of the concepts that express static things (components), and after we write a topology, the components above are present in a static way. Now, let's look at the dynamic components (concepts) that are generated when the commit topology is run: The entity that Task:spout/bolt displays at runtime, called a Task, and a spout/bolt may correspond to one or more Spout at run time task/ Bolt Task, which is related to the actual configuration when writing topology. Worker: The first level container at which the run-time task resides, Executor runs in the worker, and a worker corresponds to a JVM instance created on Supervisor Executor: the direct container where the run-time task resides. Execute the task's processing logic in executor; one or more executor instances can run in the same worker process, one or more tasks can run in the same executor, and executor can be parallel based on the worker process parallelism , and task can also implement parallel computing based on executor

Topology degree of parallelism calculation

For the calculation of parallelism of topology, there is an article on the official website (attached in the following reference link), and we explain in detail here that it will be helpful to understand some of the statistics on the Storm UI. When writing code to set the degree of parallelism, the degree of parallelism is just a hint, and storm will calculate the parallelism of the runtime based on this hint and some other parameter configurations (task count, number of workers), which in fact describes The run-time representation of multiple spout/bolt that make up a topology shows the distribution of the entity task, so we might want to focus on looking from a topology perspective, which sets the parallelism of the spout/bolt corresponding to the runtime task, How the multiple worker processes in the cluster are distributed, as well as within the executor.
The following is an example of the design of the topology, as shown in the following figure:

The example topology is configured with 2 workers, and the corresponding code example looks like this:

Topologybuilder.setspout ("Blue-spout", New Bluespout (), 2); Set the degree of parallelism to 2, the number of tasks is 2*1
topologybuilder.setbolt ("Green-bolt", New Greenbolt (), 2)
               . Setnumtasks (4)
               . Shufflegrouping ("Blue-spout"); Set the degree of parallelism to 2, set the number of tasks to 4, the number of tasks is 4
topologybuilder.setbolt ("Yellow-bolt", New Yellowbolt (), 6)
               . Shufflegrouping ("Green-bolt"); Set the degree of parallelism to 6, the number of tasks is 6*1

So, let's see how Storm calculates the degree of parallelism of a topology runtime and allocates it to 2 workers: total count of tasks: 2*1+4+6*1=12 (Total 12 task instances) calculates runtime topology parallelism: 10/2= 5 (each worker corresponds to 5 executor) assigns 12 tasks to 5*2 executor in 2 workers: 5 executor on each worker, 6 tasks to 5 executor Allocate 6 tasks per worker, should be assigned 3 yellow task, 2 green task, 1 blue Task storm internal optimizations: The same type of task will be placed as far as possible in the same executor run Allocation process: From the beginning of the minimum number of tasks, 1 blue tasks can only be placed into one executor, a total of 1 executor are occupied, 2 green tasks can be placed in the same executor, a total of 2 executor are occupied And finally see if the remaining 3 yellow tasks can be assigned to 5-2 = 3 executor, and each yellow task clearly corresponds to a executor

From the intuitive view, in fact, the allocation of tasks to multiple executor in the allocation results have many kinds, can meet as far as possible to make the same type of task in the same executor, the calculation of the storm can refer to the source code.
The example above topology at run time, where multiple tasks are assigned to the results of running distributions in the cluster, as shown in the following figure:

The internal principles of storm

A topology commits to the storm cluster, and the specific processing process is subtle and somewhat complex. First, we outline the main points: each executor may have a incoming queue and a outgoing queue, both of which are used Lmax disruptor queue (can be learned by relevant information) Two Lmax disruptor queue upstream and downstream, there will be related threads to store/remove the tuple each executor may have a send Thread, The new tuple that is used to generate the processing completion is placed into the outgoing queue queues that belong to the executor. Each executor must have a main Thread to handle the task/bolt of the spout incoming Task Queue, outgoing queue There may be a receive Thread inside each worker process to receive a tuple sent by the transfer Thread in the upstream worker, receive within a worker Thread is a outgoing Queue within each worker process that is shared by multiple executor to hold a tuple that needs to be transported across the worker (its internal transfer Thread reads a tuple from the queue for transmission) within each worker process there may be a transfer Thread that is used to send a tuple that needs to be transferred between workers to a downstream worker

Above, a lot of places I used the "possible", in fact, most of the cases are like this, pay attention to understand. Below, we illustrate the following according to the different situations of the spout Task/bolt Task runtime distribution, respectively:

Spout task runs inside executor

The Spout task and bolt task runs in executor a little differently if there is no bolt task in the same executor as the Spout task, then there is only one executor in the outgoing The queue is used to store the queues that will be transferred to the bolt task because the spout task needs to stream data continuously from a given data source. In a executor, the execution flow of the Spout task and its related components, as shown in the following figure:

The data flow processing flow described in the figure above is as follows: The Spout task reads a message or event from an external data source, invokes the Nexttuple () method in Spout in the implemented topology, and reads the data into the format of the tuple object emit () Main The thread processes the input tuple and then puts the send thread that belongs to the executor in the outgoing queue belonging to the executor to read the tuple from the outgoing queue and transfers it to one or more of the bolts downstream Task to handle

Bolt task runs inside the executor

As mentioned earlier, the bolt task runtime is a little different from the spout task in executor, where a bolt task is located in the executor with the Incoming queue and outgoing queues, Incoming The queue is used to store the data stream processing direction, the component upstream of the Bolt Task (possibly one or more spout Task/bolt Task) sent over the tuple data, outgoing queue is used to store the queue to be transferred to the downstream Bolt task. If the bolt task is the last component in the direction of the data flow processing, and the Execute () method does not have the TUPE data created by emit (), then the bolt task does not have a queue of outgoing queue. In a executor, a bolt task is used to connect the components of the upstream (Spout Task/bolt Task) and downstream (bolt Task), the execution flow of its related components within the executor of the bolt task, as shown in the following figure:

The data flow processing flow described in the figure above is as follows: Spout Task/bolt task transfers Tupe to the incoming queue in the executor where the Bolt task is downstream, Main Thread takes a tuple from the executor incoming queue and sends Tupe to the bolt task to process the logical processing of the tuple data in the Execute () method of the bolt task and generates a new tuple, The emit () method is then called to send a tuple to the next Bolt task processing (here, in fact, the main thread puts the newly-generated tuple into the executor's outgoing queue) that is part of the executor's send Thread reads a tuple from the outgoing queue and transmits it to one or more of the downstream bolt tasks to process

Transfer of tuple between 2 spout task/bolt tasks within the same worker

Within the same worker JVM instance, multiple executor instances may be created, so we understand how a tuple is transferred between two tasks, there may be 4 cases, and in the same executor, there are 2 types of cases: 1 spout Task and a bolt task in the same executor 2 bolt tasks in the same executor

We'll explain this in more detail later, as shown in the following figure: 1 spout task and one bolt task in the different 2 executor of the 2 different executor.


2 bolt tasks in a different 2 executor


The process of running a spout task and a bolt task is well understood and no longer described in the previous two cases.

Transfer of tuple between 2 executor in different worker

If it is within a different worker process, that is, on two isolated JVM instances, the logic of the transport of the tuple is uniform regardless of whether it is on the same supervisor node. Here, a spout task and a bolt task run inside the two worker processes as an example, as shown in the following figure:

The process is similar to the previous one, except that if the two worker processes are on two supervisor nodes, this transfer the thread transport tuple is the network, not the local.

A tuple routing process between tasks

Below, we care how each tuple is transferred between the various tasks of each bolt, and how to route a tuple (Routing) to multiple tasks downstream of the bolt.
Here, we should understand that as a representation of the message transmitted between tasks, define the code for the Taskmessage class as follows:

package backtype.storm.messaging; import java.nio.ByteBuffer; public class Taskmessage {private int
    _task;
    Private byte[] _message;
        public taskmessage (int task, byte[] message) {_task = task;
    _message = message;
    } public int Task () {return _task;
    } public byte[] message () {return _message;
        } public Bytebuffer serialize () {Bytebuffer BB = bytebuffer.allocate (_message.length + 2);
        Bb.putshort ((short) _task);
        Bb.put (_message);
    return BB;
        public void Deserialize (Bytebuffer packet) {if (packet = = null) return;
        _task = Packet.getshort ();
        _message = new Byte[packet.limit ()-2];
    Packet.get (_message); }
}

It can be seen that each Task is given a unique number within a topology, and it is able to properly route any one tuple to a bolt Task downstream that needs to process the tuple.
Suppose that there is a topology, consisting of 3 bolts, BOLT1, BOLT2, and BOLT3, whose relationships are set in numbered order, where BOLT1 has a 2 task,bolt2 with 2 TASK,BOLT3 with 2 tasks, Here we only care about the flow of data between the BOLT1 task and the BOLT2 task. The specific routing process, as shown in the following figure:

In the figure above, BOLT2 's two tasks are distributed within two worker processes, so when the 2 task of the upstream BOLT1 finishes processing the input tuple and generates a new tuple, there will be a task number based on the BOLT2, respectively, with the following processing: BOLT2 TASK4 is distributed within the first worker process, the new Tupe generated by BOLT1 is placed directly by the executor's send Thread, to another executor Queue incoming within the first worker TASK5 is distributed within the second worker process, the new Tupe generated by BOLT1 is placed in the executor queue of the first worker in the outgoing, which is the executor of the send thread. Sent by the transfer thread of the first worker to another worker (eventually routed to BOLT2 Task5 to process)

As you can see from the above process, each executor should maintain the relationship between the task and the executor in order to properly transfer the tuple to the destination bolt task for processing.

Reference http://storm.apache.org/http://storm.apache.org/documentation.html Http://storm.apache.org/documentation /guaranteeing-message-processing.html http://storm.apache.org/documentation/ Understanding-the-parallelism-of-a-storm-topology.html http://xumingming.sinaapp.com/410/ Twitter-storm-code-analysis-acker-merchanism/storm Applied:strategies for real-time event processing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.