High Availability test of rabbitmq Cluster

Last Update:2014-08-11 Source: Internet

Author: User

Tags haproxy

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

High Availability of rabbitmq Clusters

Rabbitmq is developed using Erlang and clusters are very convenient. Because Erlang is a distributed language, it does not support Server Load balancer itself.

The rabbit mode can be divided into the following three modes: single mode, normal mode, and image mode.

Single Mode: the simplest mode, not the cluster mode.

There is nothing to say.

Normal Mode: the default cluster mode.

For queue, the message entity only exists in one of the nodes, and A and B have only the same metadata, that is, the queue structure.

When a message enters the queue of node A and consumer is pulled from Node B, rabbitmq will transmit messages temporarily between node A and Node B, extract the message entity in a and send it to consumer after B.

Therefore, consumer should try to connect to every node and retrieve messages from it. That is, for the same logical queue, a physical queue must be created on multiple nodes. Otherwise, whether consumer connects to A or B, the exit is always at a, which will lead to a bottleneck.

In this mode, when node A fails, Node B cannot obtain the message entity that has not been consumed in node.

If message persistence is performed, you must wait until node A is restored before it can be consumed. If no persistence is available, then there will be no more ......

Image mode: creates an image queue for multiple nodes and belongs to the HA solution of rabbitmq.

This mode solves the above problem. Its essence is different from the normal mode in that the message entity actively synchronizes data between mirror nodes rather than pulling data temporarily when the consumer retrieves data.

This mode also has obvious side effects. In addition to reducing system performance, if there are too many mirror queues and a large number of messages enter, the network bandwidth inside the cluster will be greatly consumed by such synchronous communication.

Therefore, it is applicable to scenarios with high reliability requirements (This mode will be detailed later. The environment we have set up now belongs to this mode.)

Understand the basic concepts in a cluster:

Rabbitmq cluster nodes includeMemory node and disk node. As the name implies, a memory node places all data in the memory, and a disk node places data in the disk. However, as mentioned above, if message persistence is enabled during message delivery, data is stored securely on disks even on Memory nodes.

A rabbitmq cluster can share user, vhost, queue, exchange, and so on. All data and statuses must be copied on all nodes. An exception is, the message queues of the nodes that currently only create it, even though they are visible and can be read by all nodes. Rabbitmq nodes can be dynamically added to the cluster. A node can be added to the cluster, or a basic load balancing can be performed from the cluster.
The cluster has two types of nodes:
1 memory node: only save the status to the memory (an exception is that the persistent content of the persistent queue will be saved to the disk)
2. disk node: Save the status to memory and disk.
Although the memory node is not written to the disk, it performs better than the disk node. In the cluster, only one disk node is required to save the status.
If the cluster only has Memory nodes, you cannot stop them. Otherwise, all statuses and messages will be lost.

Ideas:

So how to implement high availability of rabbitmq? First, we set up a common cluster mode, then configure the image mode to achieve high availability, and add a reverse proxy before the rabbit cluster, the producer and consumer access the rabbitmq cluster through reverse proxy.

Structural Diagram: picture from http://www.nsbeta.info

In the above example, three rabbitmq instances run on the same host and use different service ports. Of course, in our production practice, multiple rabbitmq certainly run on different physical servers, otherwise it will lose the significance of high availability.

Cluster mode configuration

The design architecture can be as follows: There are four machines in a cluster, one of which uses the disk mode and the other two uses the memory mode. The two nodes in the memory mode are undoubtedly faster, so the client (consumer, producer) connects to access them. The disk mode node is used for data backup only because the disk Io is relatively slow, and the other node is used as the reverse proxy.

The hostnames of the four servers are queue, panyuntao1, panyuntao2, and panyuntao3 (IP: 172.16.3.110)

It is very easy to configure the rabbitmq cluster. Only a few commands are required. The configuration steps are as follows:

Step 1:Queue, panyuntao1, and panyuntao2 are used as rabbitmq cluster nodes to install rabbitmq-server, and then start rabbitmq-server respectively.

Start command# Rabbit-Server start,For installation and startup commands, see http://www.cnblogs.com/flat_peach/archive/2013/03/04/2943574.html

Step 2:On the Three installed node servers, modify the/etc/hosts file and specify the hosts of queue, panyuntao1, and panyuntao2, for example:

172.16.3.32 queue

172.16.3.107 panyuntao1

172.16.3.108 panyuntao2

The hostname file must also be correct, including queue, panyuntao1, and panyuntao2. If you modify the hostname, we recommend that you modify it before installing rabbitmq.

Note that the rabbitmq cluster node must be in the same CIDR block. The cross-Wan performance is poor.

Step 3: set each node cookie

Rabbitmq clusters rely on Erlang clusters. Therefore, you must first build an Erlang cluster environment. Each node in the Erlang cluster is implemented through a Magic cookie, which is stored in /Var/lib/rabbitmq/. Erlang. Cookie. Therefore, the cookies of each node must be consistent; otherwise, the communication between nodes fails. -R --------. 1 rabbitmq 20 00:00/var/lib/rabbitmq/. Erlang. CookieCopy the. Erlang. Cookie value from one node and save it to another node. You can also use SCP, but pay attention to the file permissions and owner groups. Copy the cookie in the queue to panyuntao1 and panyuntao2. Modify the. Erlang. Cookie permission in panyuntao1 and panyuntao2. # Chmod 777/var/lib/rabbitmq/. Erlang. CookieSet/var/lib/rabbitmq /. erlang. the cookie file is copied to the same location of panyuntao1 and panyuntao2 (or vice versa). This file is the verification key for communications between cluster nodes and all nodes must be consistent. After copying the file, restart rabbitmq. Do not forget to restore the. Erlang. Cookie permission after copying. Otherwise, an error may occur. # Chmod 400/var/lib/rabbitmq/. Erlang. CookieAfter setting the cookie, restart the three nodes of rabbitmq. # Rabbitmqctl stop # Rabbitmq-Server start

Step 4:This step is critical to stop the rabbitmq service on all nodes and run it independently using the detached parameter. In particular, you can refer to this order when a node cannot be started again after it is stopped. Queue # rabbitmqctl stop Panyuntao1 # rabbitmqctl stop
Panyuntao2 # rabbitmqctl stop Queue # rabbitmq-server-detached Panyuntao1 # rabbitmq-server-detached
Panyuntao2 # rabbitmq-server-detached View each node separately Queue # rabbitmqctl cluster_statusCluster status of node [email protected]...

[{Nodes, [{disc, [[email protected]},
{Running_nodes, [[email protected]},
{Partitions, []}]
... Done.

Panyuntao1 # rabbitmqctl cluster_statusCluster status of node [email protected]

[{Nodes, [{disc, [[email protected]},

{Running_nodes, [[email protected]},

{Partitions, []}]
... Done.

Panyuntao2 # rabbitmqctl cluster_statusCluster status of node [email protected] Panyuntao2...

[{Nodes, [{disc, [[email protected] Panyuntao2]},

{Running_nodes, [[email protected] Panyuntao2]},

{Partitions, []}]
... Done.

Step 4:Connect panyuntao1 and panyuntao2 as Memory nodes and queue. On panyuntao1, run the following command: Panyuntao1 # rabbitmqctl stop_app

Panyuntao1 # rabbitmqctl join_cluster -- ram [email protected]

Panyuntao1 # rabbitmqctl start_app

Panyuntao2 # rabbitmqctl stop_app Panyuntao2 # rabbitmqctl join_cluster -- ram [email protected) Panyuntao2 # rabbitmqctl start_app

The preceding command First stops the rabbitmq application, then calls the cluster command to connect panyuntao1 to make the two a cluster, and then restarts the rabbitmq application. Under this cluster command, panyuntao1 and panyuntao2 are Memory nodes, while queue is a disk node (after rabbitmq is started, it is a disk node by default ). To enable panyuntao1 or panyuntao2 to be a disk node in the cluster, run the join_cluster command to remove the -- Ram parameter. # Rabbitmqctl join_cluster [email protected]

As long as you include yourself in the node list, it becomes a disk node. In a rabbitmq cluster, at least one disk node must exist.

Step 5:In queue, panyuntao1,Panyuntao2Run the cluster_status command to view the cluster status:

[[Email protected] ~] # Rabbitmqctl cluster_status
Cluster status of node [email protected]...
[{Nodes, [{disc, [[email protected]}, {Ram, [[email protected], [email protected]},
{Running_nodes, [[email protected], [email protected], [email protected]},
{Partitions, []}]
... Done.

[[Email protected] rabbitmq] # rabbitmqctl cluster_status
Cluster status of node [email protected]...
[{Nodes, [{disc, [[email protected]}, {Ram, [[email protected], [email protected]},
{Running_nodes, [[email protected], [email protected], [email protected]},
{Partitions, []}]
... Done. [[email protected] rabbitmq] # rabbitmqctl cluster_status
Cluster status of node [email protected]...
[{Nodes, [{disc, [[email protected]}, {Ram, [[email protected], [email protected]},
{Running_nodes, [[email protected], [email protected], [email protected]},
{Partitions, []}]
... Done. Now we can see the cluster information of each node, which has two Memory nodes and one disk node.

Step 6:Writing a message queue to any cluster node is replicated to another node, and we see that the number of message queues for both nodes is consistent: (how to send messages see: http://www.cnblogs.com/flat_peach/archive/2013/03/04/2943574.html)

[Email protected]: ~ # Rabbitmqctl list_queues-P hrsystem

Listing queues...
Test_queue 10000
... Done.
[Email protected]: ~ # Rabbitmqctl list_queues-P hrsystemlisting queues...
Test_queue 10000
... Done. [email protected]: ~ # Rabbitmqctl list_queues-P hrsystemlisting queues...
Test_queue 10000
... The done.-p parameter is the vhost name.

In this way, the rabbitmq cluster will work normally, This mode is more suitable for non-persistent queues. Only when the queue is non-persistent can the client re-connect to other nodes in the cluster and re-create the queue. If the queue is persistent, the only way is to restore the faulty node. Why does rabbitmq not copy the queue to every node in the cluster? This conflicts with the design intent of its cluster. The cluster is designed to linearly increase performance (CPU, memory) and capacity (memory, disk) when more nodes are added ). The reasons are as follows:

1. storage space: if every cluster node had a full copy of every queue, adding nodes wouldn't give you more storage capacity. for example, if one node cocould store 1 GB of messages, adding two more nodes wowould simply give you two more copies of the same 1 GB of messages.

2. performance: Publishing messages wowould require replicating those messages to every cluster node. for durable messages that wocould require triggering disk activity on all nodes for every message. your network and Disk Load wocould increase every time you added a node, keeping the performance of the cluster the same (or possibly worse ).

Of course, the new version of rabbitmq clusters also support queue replication (you can configure one option ). For example, in a cluster with five nodes, you can specify that the content of a queue is stored on two nodes to strike a balance between performance and high availability.

Image mode configuration

The default cluster mode of rabbitmq is configured above, but the high availability of the queue is not guaranteed. Although the switch and binding can be copied to any node in the cluster, the queue content will not be copied, although this mode solves the pressure on some nodes, the queue node is directly unavailable because of the downtime of the queue node and can only be used properly if the queue node is down or faulty, to copy the queue content to each node in the cluster, you must create an image queue. Let's take a look at how the image mode solves the replication problem to improve availability.

Step 1: Add a Server Load balancer instance

With regard to Load balancer, commercial BIG-IP such as F5, radware's appdirector is a hardware architecture product that can achieve high processing capabilities. However, the expensive prices of these products will stop people, so we still have software Load Balancing solutions. Software LB commonly used by Internet companies generally include LVS, haproxy, and nginx. LVS is a kernel-layer product mainly responsible for packet forwarding on the fourth layer, which is complicated to use. Haproxy and nginx are application-layer products, but nginx is mainly used to process HTTP. Therefore, haproxy is selected as the lb of the rabbitmq frontend.

Haproxy is easy to install and use. In centos, directly run Yum install haproxy and change the/etc/haproxy. cfg file. The file content is roughly as follows:

# --------------------------------------------------------------------- Defaults
Mode HTTP
Log global
Option httplog
Option dontlognull
Option http-server-close
Option forwardfor partition t 127.0.0.0/8
Option redispatch
Retries 3
Timeout http-request 10 s
Timeout queue 1 m
Timeout connect 10 s
Timeout client 1 m
Timeout Server 1 m
Timeout http-keep-alive 10 s
Timeout check 10 s
Maxconn 3000 Listenrabbitmq_cluster 0.0.0.0: 5672 Mode TCP Balance roundrobin Server rqslave1 172.16.3.107: 5672 check inter 2000 rise 2 fall 3 Server rqslave2 172.16.3.108: 5672 check inter 2000 rise 2 fall 3 # Server rqmaster 172.16.3.32: 5672 check inter 2000 rise 2 fall 3#---------------------------------------------------------------------

The server Load balancer listens to port 5672 and polls port 5672 of our two Memory nodes 172.16.3.107 and 172.16.3.108. Port 172.16.3.32 is the disk node, and only backup is not provided to producers and consumers, of course, if we have sufficient server resources, we can also configure multiple disk nodes, so that the disk node will not be affected unless a fault occurs at the same time. Step 2: configure the policy The rabbit image function must be implemented based on the rabbitmq policy,Policy is used to control and modify a vhost queue behavior and exchange behavior within the cluster range.

When a policy is enabled on any node in the cluster, the policy is automatically synchronized to the cluster node.

# rabbitmqctl set_policy -p hrsystem ha-allqueue"^" ‘{"ha-mode":"all"}‘

This command creates a policy named ha-allqueue in the vhost named hrsystem. The policy mode is "all", that is, copying to all nodes, including new nodes,

The regular expression "^" indicates all matching queue names.

For example, rabbitmqctl set_policy-P hrsystem ha-allqueue "^ message" '{"Ha-mode": "All "}'

Note :" ^ Message "This rule needs to be modified by yourself. This refers to the name of the queue starting with synchronization" message ", which is used for all queues during configuration, so the expression is" ^"

For more information about set_policy, see

Set_policy [-P Vhostpath] { Name}{ Pattern}{ Definition}[ Priority] (Http://www.rabbitmq.com/man/rabbitmqctl.1.man.html) Ha-mode:

Ha-Mode	Ha-Params	Result
All	(Absent)	Queue is mirrored guest SS all nodes in the cluster. When a new node is added to the cluster, the queue will be mirrored to that node.
Exactly	Count	Queue is mirroredCountNodes in the cluster. If there are lessCountNodes in the cluster, the queue is mirrored to all nodes. If there are moreCountNodes in the cluster, and a node containing a mirror goes down, then a new mirror willNotBe created on another node. (This is to prevent queues migrating implements ss a cluster as it is brought down .)
Nodes	Node names	Queue is mirrored to the nodes listed inNode names. If any of those node names are not a part of the cluster, this does not constitute an error. if none of the nodes in the list are online at the time when the queue is declared then the queue will be created on the node that the declaring client is connected.

Step 3: You must specify the HA parameter when creating a queue. If you do not specify X-ha-prolicy, the data cannot be copied. The following C # code snippetUsing (VAR bus = rabbithuch. createbus (configurationmanager. connectionstrings ["rabbitmq"]. tostring () {bus. subscribe <testmessage> ("word_subscriber", message => runtable (Message), x => X. withargument ("X-ha-Policy", "all"); console. writeline ("subscribe started. hit any key quit "); console. readkey ();} Step 4:The client uses the Server Load balancer 172.16.3.110 (panyuntao3) to send messages, and the queue will be copied to all nodes. Of course, you can configure several nodes in the policy, at this time, any node failure or restart will not affect our normal use of a queue here we have completed the high availability configuration (all nodes are down, there is no way ). Use the rabbitmq management terminal to view the column status in cluster image mode.

Refer:

Http://www.rabbitmq.com/clustering.html

Http://www.rabbitmq.com/ha.html

Http://www.rabbitmq.com/parameters.html#policies

Http://www.nsbeta.info/archives/555

Http://blog.csdn.net/linvo/article/details/7793706

This article from http://www.cnblogs.com/flat_peach/archive/2013/04/07/3004008.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More