RABBITMQ Cluster High Availability
RABBITMQ is developed with Erlang, which is very handy because Erlang is inherently a distributed language, but it does not natively support load balancing.
Rabbit mode is roughly divided into the following three types: Single mode, normal mode, mirror mode
Single mode: The simplest case, non-clustered mode.
There's nothing to say.
Normal mode: The default cluster mode.
For a queue, the message entity exists only in one of the nodes, and a, b two nodes have only the same metadata, that is, the queue structure.
When the message enters the A-node queue, consumer pulls from the B-node, RABBITMQ temporarily transmits the message between A and B, takes the message entity in a and sends it to the consumer by B.
So consumer should try to connect to each node and fetch messages from it. That is, for the same logical queue, a physical queue is established on multiple nodes. Otherwise, whether consumer A or B, the export is always a, it will create a bottleneck.
One problem with this pattern is that when a node fails, the B node cannot fetch the message entities that are not yet consumed in the a node.
If you do a message persistence, you have to wait for the a node to recover before it can be consumed; if it is not persisted, then there is no more ...
Mirroring mode: The desired queue is made into a mirror queue, which exists in multiple nodes and belongs to the HA scheme of RABBITMQ.
This mode solves the above problem, the essence of which differs from the normal pattern in that the message entity is actively synchronizing between the mirror nodes rather than being temporarily pulled when the consumer is fetching data.
The side effects of this pattern are also obvious, in addition to reducing system performance, if the number of mirror queue, coupled with a large number of messages, the network bandwidth within the cluster will be greatly consumed by this synchronous communication.
So in the case of high reliability requirements ( This model is described in detail later, the environment we are building is the model )
Understand the basic concepts in a cluster:
RABBITMQ cluster nodes include memory nodes, disk nodes . As the name implies, the memory node is where all the data is placed in memory, and the disk node places the data on disk. However, as mentioned earlier, if the message is persisted when the message is posted, even the memory node, the data is safely placed on the disk.
A RABBITMQ cluster can share user,vhost,queue,exchange, and so on, all the data and state must be replicated on all nodes, with the exception that the message queue that currently belongs only to the node that created it, although they are visible and can be read by all nodes. RABBITMQ nodes can be dynamically added to the cluster, a node can be added to the cluster, or from the cluster ring cluster can be a basic load balancing.
There are two types of nodes in a cluster:
1 Memory nodes: Save state to memory only (one exception is: persistent content of a persistent queue will be saved to disk)
2 disk node: Save state to memory and disk.
A memory node is not written to disk, but it performs better than a disk node. In a cluster, only one disk node is required to hold the state is sufficient
If there are only memory nodes in the cluster, you cannot stop them, otherwise all States, messages, and so on will be lost.
Ideas:
Then how to achieve rabbitmq high availability, we first set up a common cluster mode, in this mode based on the configuration of mirroring mode to achieve high availability, rabbit cluster before adding a reverse proxy, producers, consumers through the reverse proxy access RABBITMQ cluster.
The schema diagram is as follows: Image from Http://www.nsbeta.info
The above figure is 3 RABBITMQ running on the same host, each with a different service port. Of course, in our production practice, more than one RABBITMQ must be running on different physical servers, otherwise it loses the meaning of high availability.
- Cluster mode configuration
The design architecture can be as follows: In a cluster, there are 4 machines, 1 of which use Disk mode, and another 2 use memory mode. 2 memory-mode nodes are undoubtedly faster, so clients (consumer, producer) connections access them. While the disk-mode node is relatively slow due to disk IO, it is used only for data backup and the other as a reverse proxy.
Four servers hostname: queue, Panyuntao1, Panyuntao2, Panyuntao3 (ip:172.16.3.110)
Configuring the RABBITMQ cluster is simple and requires only a few commands, with the following configuration steps:
Step1:queue, Panyuntao1, Panyuntao2 as RABBITMQ cluster node, respectively, install Rabbitmq-server, after installation, respectively start Rabbitmq-server
Start Command # rabbit-server start, installation process and startup command see: http://www.cnblogs.com/flat_peach/archive/2013/03/04/2943574.html
Step2: in the installed three node server, modify the/etc/hosts file separately, specify queue, Panyuntao1, PANYUNTAO2 hosts, such as:
172.16.3.32 queue
172.16.3.107 Panyuntao1
172.16.3.108 Panyuntao2
There are also hostname files to be correct, respectively, is queue, PANYUNTAO1, PANYUNTAO2, if modified hostname recommended to install RABBITMQ before modification.
Please note that the RABBITMQ cluster node must be in the same network segment, if it is cross-wan effect is poor.
Step3: Set each node cookie
RABBITMQ clusters are dependent on Erlang's clusters to work, so you must first build a clustered environment of Erlang. The nodes in Erlang's cluster are implemented by a magic cookie, which is stored in
/var/lib/rabbitmq/.erlang.cookie and the file is 400 privileged. Therefore, it is necessary to ensure that each node cookie is consistent, otherwise the nodes cannot communicate. -
R--------. 1 rabbitmq rabbitmq 20 March 5 00:00/var/lib/rabbitmq/.erlang.cookie Copy the. Erlang.cookie value on one of the nodes and save it to another node. Or use the SCP method, but be aware of the permissions of the file and belong to the main group. Here we will copy the cookie in the queue to Panyuntao1, Panyuntao2, first modify the Panyuntao1, Panyuntao2. Erlang.cookie Permissions
#chmod 777/var/lib/ Rabbitmq/.erlang.cookie Copies the/var/lib/rabbitmq/.erlang.cookie of the queue to the same location in Panyuntao1, Panyuntao2 (in turn), The file is the authentication key that the cluster node communicates with, and all nodes must be consistent. Restart the RABBITMQ after the copy is finished. Do not forget to restore. Erlang.cookie permissions, or you may experience errors
#chmod 400/var/lib/rabbitmq/.erlang.cookie Restart the three node RABBITMQ after setting the cookie
# rabbitmqctl stop
# rabbitmq-server start
Step4:Stopping all node RABBITMQ services and then using the detached parameter to run independently, this step is critical, especially if the node is added to stop the node after the start of the encounter cannot start can be referenced in this order
queue# rabbitmqctl Stop
panyuntao1# Rabbitmqctl Stop
panyuntao2# rabbitmqctl Stop
queue# rabbitmq-server-detached
panyuntao1# rabbitmq-server-detached
panyuntao2# rabbitmq-server-detached
View each node separately
queue# rabbitmqctl cluster_statusCluster Status of node [email protected] ...
[{Nodes,[{disc,[[email protected]}]},
{Running_nodes,[[email protected]]},
{partitions,[]}]
... done.
panyuntao1# rabbitmqctl cluster_statusCluster Status of node [email protected]
[{nodes,[{disc,[[email protected]}]},
{Running_nodes,[[email protected]]},
{partitions,[]}]
... done.
panyuntao2# rabbitmqctl cluster_statusCluster Status of node [email protected]
Panyuntao2...
[{nodes,[{disc,[[email protected]
Panyuntao2]}]},
{running_nodes,[[email protected]
Panyuntao2]},
{partitions,[]}]
... done.
Step4:Connect the PANYUNTAO1, Panyuntao2 as memory nodes to the queue, and on Panyuntao1, execute the following command:
panyuntao1# rabbitmqctl Stop_app
panyuntao1# rabbitmqctl join_cluster--ram [email protected]
panyuntao1# Rabbitmqctl Start_app
panyuntao2# rabbitmqctl stop_app
panyuntao2# rabbitmqctl join_cluster--ram [email protected] ( The above has already connected the panyuntao1 with the queue, can also directly connect the PANYUNTAO2 with the PANYUNTAO1, but also join the cluster)
panyuntao2# rabbitmqctl Start_app
The above command stops the RABBITMQ application, then calls the cluster command, connects the PANYUNTAO1 to a cluster, and finally restarts the RABBITMQ application. Under this cluster command, PANYUNTAO1, Panyuntao2 are memory nodes, and the queue is a disk node (the default is the disk node after RABBITMQ startup). Queue if you want to make panyuntao1 or Panyuntao2 in the cluster is also a disk node, Join_cluster command to remove the--ram parameters can be
#rabbitmqctl join_cluster [email protected]
As long as you include yourself in the node list, it becomes a disk node. In a RABBITMQ cluster, at least one disk node must exist.
step5: on the queue, Panyuntao1,Panyuntao2 , run the cluster_status command to view the cluster status:
[Email protected] ~]# Rabbitmqctl cluster_status
Cluster Status of node [email protected] ...
[{nodes,[{disc,[[email protected]]},{ram,[[email protected],[email protected]}]},
{running_nodes,[[email protected],[email protected],[email protected]},
{partitions,[]}]
... done.
[Email protected] rabbitmq]# Rabbitmqctl cluster_status
Cluster Status of node [email protected] ...
[{nodes,[{disc,[[email protected]]},{ram,[[email protected],[email protected]}]},
{running_nodes,[[email protected],[email protected],[email protected]},
{partitions,[]}]
... done. [Email protected] rabbitmq]# Rabbitmqctl cluster_status
Cluster Status of node [email protected] ...
[{nodes,[{disc,[[email protected]]},{ram,[[email protected],[email protected]}]},
{running_nodes,[[email protected],[email protected],[email protected]},
{partitions,[]}]
... done. At this point we can see the cluster information for each node, each with two memory nodes, one disk node
STEP6:to any one cluster node to write to the message queue, will be copied to another node, we see two nodes of the same number of Message Queuing: (How to send a message see: http://www.cnblogs.com/flat_peach/archive/2013/03/04/2943574.html)
[Email protected]: ~# rabbitmqctl list_queues-p Hrsystem
Listing queues ...
Test_queue 10000
... done.
[Email protected]: ~# rabbitmqctl list_queues-p hrsystemlisting queues ...
Test_queue 10000
... done. [Email protected]:~# rabbitmqctl list_queues-p hrsystemlisting queues ...
Test_queue 10000
... done. -P parameter is vhost name
So the RABBITMQ cluster is working properly,
This mode is more suitable for non-persistent queues, where only the queue is non-persistent and the client can reconnect to other nodes in the cluster and recreate the queue. If the queue is persisted, the only way to recover the failed nodes is to restore them. Why RABBITMQ not copy the queue to each node in the cluster? This conflicts with the design intent of its cluster, which is designed to increase the performance (CPU, memory) and capacity (memory, disk) linearly when more nodes are added. The reasons are as follows:
1. Storage space:if every cluster node had a full copy of the every queue, adding nodes wouldn ' t give you more storage Capaci Ty. For example, if one node could store 1GB of messages, adding. Nodes would simply give you and more copies of the S Ame 1GB of messages.
2. performance:publishing messages would require replicating those messages to every cluster node. For durable messages this would require triggering disk activity on all nodes for every message. Your network and disk load would increase every time you added a node keeping the performance of the cluster the same (or Possibly worse).
Of course RABBITMQ The new version of the cluster also supports queue replication (there is an option to configure). For example, in a cluster with five nodes, you can specify the contents of a queue to be stored on 2 nodes, thus achieving a balance between performance and high availability.
- Mirroring mode Configuration
The
above configuration rabbitmq default cluster mode, but does not guarantee the high availability of the queue, although the switch, binding these can be replicated to any node in the cluster, but the queue content is not replicated, although the mode to resolve a part of the node pressure, but the queue node downtime directly caused the queue to be unusable, Can only wait for a reboot, so in order to be able to use the queue node down or failure, it is necessary to copy the queue contents to each node in the cluster, and create a mirror queue. Let's look at how to mirror the pattern to solve the problem of replication, thus improving usability
step1: Increase load balancer
About load balancers, commercial, such as F5 's Big-ip,radware appdirector, is a hardware-architecture product that enables high processing power. But the expensive price of these products can stop people, so we have a software load balancing scheme. Internet companies commonly used software lb generally have LVs, HAProxy, Nginx and so on. LVS is a core layer of products, mainly in the fourth layer is responsible for packet forwarding, the use of more complex. Haproxy and Nginx are the product of the application layer, but Nginx is mainly used for HTTP processing, so here we choose Haproxy as the RABBITMQ front-end lb.
Haproxy installation is very easy to use, under CentOS Direct yum install Haproxy, and then change the/etc/haproxy/haproxy.cfg file, the file content is as follows:
#---------------------------------------------------------------------Defaults
Mode http
Log Global
Option Httplog
Option Dontlognull
Option Http-server-close
Option Forwardfor except 127.0.0.0/8
Option Redispatch
Retries 3
Timeout Http-request 10s
Timeout Queue 1m
Timeout Connect 10s
Timeout Client 1m
Timeout Server 1m
Timeout Http-keep-alive 10s
Timeout Check 10s
Maxconn 3000
Listen Rabbitmq_cluster 0.0.0.0:5672
Mode TCP
Balance Roundrobin
Server rqslave1 172.16.3.107:5672 check Inter rise 2 Fall 3
Server rqslave2 172.16.3.108:5672 check Inter rise 2 Fall 3
# server Rqmaster 172.16.3.32:5672 check Inter rise 2 Fall 3#---------------------------------------------------------------------
The load balancer listens on port 5672, polls our two memory nodes 172.16.3.107, 172.16.3.108 5672 ports, 172.16.3.32 is a disk node, only backups are not provided to producers, consumers, Of course, if we have sufficient server resources, we can configure multiple disk nodes so that the disk nodes will not be affected except for failures, unless they fail at the same time.
Step2: Configuring Policies
with the Rabbit mirroring feature, you need to implement it based on the RABBITMQ policy,Policy is a vhost queue behavior and exchange behavior that is used to control and modify cluster-wide
在cluster中任意节点启用策略,策略会自动同步到集群节点
Rabbitmqctl set_policy-p hrsystem ha-allqueue "^" ' {"Ha-mode": "All"} '
This line of command in the Vhost name is called Hrsystem created a policy, the policy name is Ha-allqueue, the policy mode is all copy to all nodes, including the new node,
A policy regular expression of "^" means that all matching queue names are matched.
For example Rabbitmqctl set_policy-p hrsystem ha-allqueue "^message" ' {"Ha-mode": "All"} '
注意:"
^message
"This rule should be modified according to its own, this refers to the synchronization" message "Beginning of the queue name, we configure the use of the application to all queues, so the expression is" ^ "
Official Set_policy instructions See
Set_policy [-P
Vhostpath] {
name} {
pattern} {
definition} [
Priority] (http://www.rabbitmq.com/man/rabbitmqctl.1.man.html) Ha-mode:
Ha-mode |
Ha-params |
Result |
All |
(absent) |
Queue is mirrored across all nodes in the cluster. When a new node was added to the cluster, the queue would be mirrored to that node. |
Exactly |
Count |
Queue is mirrored to count nodes in the cluster. If there was less than count nodes in the cluster, the queue was mirrored to all nodes. If there is more than countnodes in the cluster, and a node containing a mirror goes off, then a new mirror wil L not being created on another node. (This was to prevent queues migrating across a cluster as it was brought down.) |
Nodes |
Node names |
Queue is mirrored to the nodes listed in node names. If any of those node names is a part of the cluster, this does not constitute an error. If none of the nodes in the "the" and "the" is "declared" then the queue would be created on the n Ode that the declaring client was connected to. |
Step3:
The ha parameter needs to be specified when creating the queue and cannot be replicated if X-ha-prolicy is not specified
The following C # code fragmentusing (var bus = Rabbithutch.createbus (ConfigurationManager). connectionstrings["RabbitMQ"]. ToString ())) {Bus. subscribe< testmessage> ("word_subscriber", message = runtable (message), X=>x.withargument ("X-ha-policy "," all ")); Console.WriteLine ("Subscription Started. Hit any key quit "); Console.readkey (); }
Step4:The client sends a message using the Load server 172.16.3.110 (PANYUNTAO3), the queue is replicated to all nodes, and of course the policy can be configured to make a few nodes, and any node failure or reboot will not affect our normal use of a queue Here we have finished the highly available configuration (all nodes are down and there is no way). Use the RABBITMQ management side to see the status of the columns in the cluster mirroring mode
Reference:
Http://www.rabbitmq.com/clustering.html
Http://www.rabbitmq.com/ha.html
Http://www.rabbitmq.com/parameters.html#policies
http://www.nsbeta.info/archives/555
http://blog.csdn.net/linvo/article/details/7793706
RABBITMQ Cluster high-availability test