Storm starter Tutorial fourth. Reliable handling of messages "go"

Source: Internet
Author: User
Tags emit

4.1 Introduction

Storm can ensure that every message sent out by spout is processed completely. This chapter will describe how the storm system achieves this goal, and will detail how developers should use storm's mechanisms to achieve reliable data processing.

4.2 Understanding the message is fully processed

A message (tuple) sent from spout may cause hundreds or thousands of messages to be created based on this message.

Let's consider the flow of "word statistics" for example:

The storm task reads a complete English sentence from the data source (Kestrel queue) each time, breaking the sentence into separate words, and finally, outputting each word in real time and the number of times it appears.

In this case, each message sent from spout (every English sentence) triggers a lot of messages to be created, and the words that are separated from the sentences are the new messages that are created.

These messages form a tree structure that we call "tuple tree", which looks like 1:

Figure 1 Example of a tuple tree

Under what conditions does storm think that a message sent from spout is fully processed? The answer is that the following conditions are also met:

    • A tuple tree no longer grows
    • Any messages in the tree are identified as "handled"

If a tuple tree derived from a message is not successfully processed for a specified period of time, the message is considered not to be fully processed. This timeout value can be configured with the task-level parameter config.topology_message_timeout_secs, with a default timeout of 30 seconds.

4.3 Life cycle of messages

If the message is fully processed or not fully processed, how will storm proceed? To figure this out, let's look at the life cycle of messages sent from spout. Here is a list of interfaces that spout should implement:

First, Storm uses the Nexttuple () method of the spout instance to request a message (tuple) from spout. After the request is received, Spout uses the spoutoutputcollector provided in the Open method to send one or more messages to its output stream. For each message sent, spout will provide a message ID to the messages, which will be used to identify the message.

Assuming that we read the message from the Kestrel queue, spout will list the Kestrel team as the message ID of the messages set as the ID for this message. Send the message format to Spoutoutputcollector as follows:

Next, these messages are sent to bolts for subsequent business processing, and Storm keeps track of new messages generated by this message. When a tuple tree that is derived from a message is detected to be fully processed, storm invokes the Ack method in spout and passes the MessageID of the message as a parameter. Similarly, if a message processing time-out, the spout fail method for this message is called, and the MessageID of the message is passed in as a parameter when invoked.

Note: A message will only invoke ACK or fail by the spout task that sent it. If a spout in the system is run by more than one task, the message is only answered (ACK or fail) by the spout task that created it, and is never answered by another spout task.

We continue to use the example of reading messages from the Kestrel queue to illustrate what spout needs to do under high reliability (assuming the spout name is Kestrelspout).

Let's start by outlining the Kestrel message queue:

When Kestrelspout reads a message from the Kestrel queue, it says "open" a message in the queue. This means that the message is not actually deleted from the queue, but instead it is set to the "Pending" state, which waits for an answer from the client, and the message is actually removed from the queue after it is answered. Messages that are in the "pending" state are not visible to other clients. In addition, if a client disconnects unexpectedly, all messages "open" By this client are re-added to the queue. When the message is "open", the Kestrel queue also provides a unique identifier for the message.

Kestrelspout is using this unique identifier as the MessageID of this tuple. Later when an ACK or fail is called, Kestrelspout sends the ACK or fail along with MessageID to the Kestrel queue, kestrel the message from the queue to be actually deleted or put back in the queue.

4.4 Reliable and relevant APIs

To use the reliable processing features that storm provides, we need to do two things:

    1. Whenever a new node is created in a tuple tree, we need to explicitly notify Storm;
    2. When we're done with a separate message, we need to tell storm the change state of the tuple tree.

With the above two steps, Storm can detect when a tuple tree is fully processed and invoke the associated ACK or fail method. Storm provides a simple and straightforward way to accomplish these two steps.

Adds a new node to the node specified in the tuple tree, which we call anchoring (anchoring). Anchoring is done at the same time as we send the message. To make it easier to explain the problem, we use the following code as an example. The bolt for this example decomposes the message containing the whole sentence into a series of sub-messages, each containing a single word.

Each message is anchored in this way: the input message is used as the first parameter of the emit method. Because the word message is anchored to the input message, the input message is the root node of the tuple tree sent by spout, and if any word message processing fails, the spout message derived from this tuple tree will be resent.

Instead, let's look at how storm will handle emit messages using the following method:

If the message is sent in this way, it will cause the message to not be anchored. If the message processing in this tuple tree fails, the root message that derives from this tuple tree will not be resent. Depending on the level of fault tolerance of the task, it is sometimes appropriate to send a non-anchored message.

An output message can be anchored to one or more input messages, which is useful when doing a join or aggregation. A multi-anchored message processing failure causes multiple spout messages associated with it to be resent. Multiple anchors are implemented by specifying multiple input messages in the Emit method:

Multiple anchors Add the anchored message to more than a single tuple tree.

Note: Multiple bindings can break the traditional tree structure and thus form a dags (a forward-free graph), as shown in 2:

Figure 2 diamond-shaped structure with multiple anchors

Storm implementations can handle dags like a tree.

Anchoring shows how to add a message to the specified tuple tree, and the next section of the highly reliable processing API will describe what we should do when we finish processing a separate message in the tuple tree. These are achieved through the Outputcollector ack and fail methods. Looking back at the example splitsentence, you can see that when all word messages are sent, the input message that represents the sentence is answered (acked).

Each processed message must indicate success or failure (acked or failed). Storm is using memory to track the handling of each message, and sooner or later the memory will be exhausted if the processed message is not answered! >

Many bolts follow a specific process: read a message, send a child message that it derives from, and answer the message at the end of the execute. General Filters (filter) or simple processing functions are applications of this type. Storm has a Basicbolt interface that encapsulates the process described above. Example splitsentence can be overridden by using Basicbolt:

In this way, the code is a little bit simpler than it was before, but the functionality of the implementation is the same. Messages sent to Basicoutputcollector are automatically anchored to the input message, and when execute finishes, the input message is automatically answered.

In many cases, a message requires a deferred response, such as aggregation or join. All previous input messages are answered only after a result is obtained based on a set of input messages. and aggregations and joins most of the time are multiple anchors to the output message. However, these features are not ibasicbolt to handle.

4.5 Efficient implementation of the tuple tree

The Storm system has a set of special tasks called "Acker" that are responsible for tracking each message in a DAG (directed acyclic graph). Whenever a DAG is found to be fully processed, it sends a signal to the spout task that created the root message. The degree of parallelism of Acker tasks in a topology can be set by configuring parameter Config.topology_ackers. The default Acker task parallelism is 1, and when there are a large number of messages in the system, the concurrency of the Acker task should be improved appropriately.

To understand the storm reliability processing mechanism, we start with the study of the life cycle of a message and the management of a tuple tree. When a message is created (whether in spout or bolt), the system assigns a 64bit random value as the ID for the message. These random IDs are Acker used to track the tuple tree derived from the spout message.

Each message knows the ID of the root message corresponding to the tuple tree in which it resides. Whenever the bolt is reborn as a message, the MessageID of the root message corresponding to the tuple tree is copied into the message. When the message is answered, it sends information about the change in the tuple tree to the Acker that tracks the tree. For example, he would tell Acker: This message has been processed, but I've derived some new information to help track it.

For example, suppose that messages D and e are derived from message C, which shows how the tuple tree changes when message C is answered.

Because D and e are added to the tuple tree when C is removed from the tree, the tuple tree is not considered premature to be fully processed.

Let's delve into how storm tracks the tuple tree. As mentioned earlier, there can be any number of Acker in the system, then, whenever a message is created or answered, how does it know which Acker should be notified?

The system uses a hashing algorithm to determine, based on the messageid of the spout message, which Acker traces the tuple tree derived from this message. Because each message knows the MessageID of the root message that corresponds to it, it knows which Acker to communicate with.

When spout sends a message, it notifies the corresponding Acker that a new root message was generated, and Acker creates a new tuple tree. When Acker found that the tree was fully processed, he would notify the corresponding spout mission.

How are the tuple tracked? There are thousands of messages in the system, and if a tree is built for every message sent by spout, the memory will be exhausted soon. Therefore, you must use a different strategy to track each message. With the use of a new tracking algorithm, storm only needs a fixed amount of memory (about 20 bytes) to track a tree. This algorithm is the core of the correct operation of storm, and is the biggest breakthrough of storm.

The Acker task holds a mapping of the spout message ID to a pair of values. The first value is the task ID of spout, and through this id,acker you know which spout task to notify when the message processing is complete. The second value is a 64bit number, which we call "Ack Val", which is the XOR result of the random ID of all messages in the tree. The ACK Val indicates the state of the whole tree, no matter how large the tree is, it only needs the fixed-size number to track the whole tree. When a message is created and answered, it will have the same message ID sent over to do the XOR.

Whenever Acker discovers that the ACK Val value of a tree is 0, it knows that the tree has been completely disposed of. Because the random ID of the message is a 64bit value, the probability of ACK Val being set to 0 before the tree is processed is very small. Let's say you send 10,000 messages per second, in the probability that it will take at least 50,000,000 years to get a chance to make a mistake. Even so, only if the message does not handle the failure of the case will be the loss of data!

4.6 Choosing the right level of reliability

The Acker task is lightweight, so there is no need for too many Acker in the topology. You can observe the throughput of the Acker task through the Storm UI, and if you seem to have insufficient throughput, you need to add additional Acker.

If you do not require each message to be processed (you allow some information to be lost during processing), you can turn off the reliable processing mechanism of the message, thus obtaining better performance. The reliable processing mechanism for closing messages means that the number of messages in the system is halved (no answer is required for each message). In addition, the reliable processing of the shutdown message reduces the size of the message (it does not require each tuple to record its root ID), thereby saving bandwidth.

There are three ways to handle the reliable handling of messages:

    • Set the parameter config.topology_ackers to 0, by this method, when the spout sends a message, its Ack method will be called immediately;
    • The second method is to not specify the MessageID of this message when spout sends a message. You can use this method when you need to turn off the reliability of a particular message.
    • Finally, if you do not care about the reliability of a descendant message derived from a message, then the message derived from it is not anchored when it is sent, that is, the input message is not specified in the emit method. Because these descendant messages are not anchored in any tuple tree, their failure does not cause any spout to resend the message.
4.7 Fault tolerance at all levels of the cluster

So far, you've understood Storm's reliability mechanisms and learned how to choose different levels of reliability to meet your needs. Next, let's look at how storm ensures that data is not lost in all situations.

3.7.1 Task level failure
    • The message caused by the bolt task crash is not answered. At this point, all messages associated with this bolt task in Acker will fail because of a timeout, and the Fail method corresponding to spout is called.
    • Acker task failed. If the Acker task itself fails, all messages it holds before it fails will fail due to a timeout. The fail method of the spout will be called.
    • Spout task failed. In this case, the external device (such as MQ) on which the spout task is docked is responsible for the integrity of the message. For example, in the case of a client exception, the Kestrel queue will put all messages in the pending state back into the queue.
4.7.2 task slot (slot) failure
    • Worker failed. Each worker contains several bolt (or spout) tasks. Supervisor is responsible for monitoring these tasks, and when the worker fails, supervisor attempts to restart it on the local machine.
    • Supervisor failed. Supervisor is stateless, so the failure of supervisor does not affect the currently running task, as long as it is restarted in a timely manner. The supervisor is not self-lifting and requires external monitoring to restart in a timely manner.
    • Nimbus failed. Nimbus is stateless, so the failure of Nimbus does not affect the currently running task (a new task cannot be submitted when the Nimbus fails), as long as it is restarted in a timely manner. The Nimbus is not self-lifting and requires external monitoring to restart in a timely manner.
4.7.3. cluster node (machine) failure
    • Node failure in the Storm cluster. At this point Nimbus will move all running tasks on this machine to other available machines.
    • Node failure in the Zookeeper cluster. The zookeeper guarantees that less than half of the machine's downtime is still operational and that the faulty machine can be repaired in a timely manner.
4.8 Summary

This chapter describes how a storm cluster can reliably handle data. With the help of the innovative tuple tree tracking technology, storm efficiently uses the data response mechanism to ensure that data is not lost.

No single point exists in the storm cluster except Nimbus, and any node can fail to ensure that the data is not lost. The Nimbus is designed to be stateless and will not affect running tasks as long as it can be restarted in a timely manner.

Storm starter Tutorial fourth. Reliable handling of messages "go"

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.