RABBITMQ Clustering and failure handling

Last Update:2016-06-26 Source: Internet

Author: User

Tags rabbitmq

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The RABBITMQ built-in cluster is designed to accomplish two goals: allowing consumers and producers to continue to run while the RABBITMQ node is crashing, and to linearly scale the throughput of message traffic by adding more nodes. When a RABBITMQ node is lost, the client can connect to any other node in the cluster and continue to produce or consume messages. Similarly, if the RABBITMQ cluster is struggling to cope with a large amount of message traffic, it can increase performance linearly by adding more nodes.

The RABBITMQ cluster does not guarantee that the message is foolproof: Because RABBITMQ does not replicate the contents of the queue to the entire cluster by default. Without a special configuration, these messages exist only on the node to which the queue belongs.

RABBITMQ Cluster architecture

RABBITMQ will always record the following four types of internal metadata:

Queue metadata-queue name is other properties
Exchanger metadata-exchanger name, type, property
Binding metadata-a simple table showing how to route messages to a queue
Vhost metadata-provides namespaces and security attributes for queues, switches, and bindings within Vhost

In a single node, RABBITMQ stores These metadata information on the hard disk, and the queues and switches (and their bindings) that are marked as persistent are stored on the hard disk. stored on hard disks and switches and queues re-rebuilt after restarting RABBITMQ. When a cluster is introduced, RABBITMQ needs to keep track of the new metadata type: The cluster node location, and the relationship of the node to the other types of metadata that have been recorded.

Queues in a cluster

In a RABBITMQ cluster, not every node has a full copy of all the queues. If you create a queue in a cluster, the cluster will only create complete queue information (metadata, status, and content) on a single node rather than on all nodes. The result is that only the owner node of the queue knows all the information about the queue. All other non-owner nodes only know the metadata of the queue and pointers to the node that the queue exists in. So when the cluster node crashes, the queue and associated bindings for that node are gone. The consumer attached to the queue also loses the subscription information, and any new messages that match the queue's binding information are also lost. You can re-create the queue by having consumers reconnect to the cluster. However, this approach is only possible when the queue is not set to be persisted at the very beginning.

Why does RABBITMQ not copy the queue contents and state to all nodes by default?

Storage space
performance, reducing network and disk load.

Distribution Exchanger

A switch is a name and a list of queue bindings. When a message is published to a switch, it is actually a channel connected to the route by which the message is compared by the key to the switch's binding, and then the message is routed.

When creating a new switch, RABBITMQ is to add the query table to all nodes in the cluster.

What happens if the message has been published to the channel, but the node fails before the message is routed?

The AMQP basic.publish command does not return the status of the message. This situation means that the message will be lost. The solution is to use the AMQP transaction, which continues to block until the message is routed to the queue, or to use the Send ACK mode to record that the connection interruption is a message that has not been acknowledged.

Memory node and disk phase

Memory phase: The metadata definitions for all queues, switches, bindings, users, permissions, and Vhost are in memory. The disk node stores the metadata on disk. A single-node system allows only the disk-type nodes: Otherwise, all configuration information about the system will be lost after each restart of RABBITMQ.

RABBITMQ only requires that there is at least one disk node in the cluster, and the other nodes can be memory nodes. When a node joins or leaves a cluster, they must notify the change to at least one disk node. If there is only one disk node, the cluster can continue to route the message (that is, keep running) after the disk node crashes, but it cannot change anything until the node recovers. Typically, you set up two disk nodes in a cluster.

RABBITMQ Clustering and failure handling

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More