Transferred from: Http://my.oschina.net/hncscwc/blog/186350?p=1
1. Settings for the mirroring queue
The configuration of the mirror queue is completed by adding policy, and the commands added by policy are:
Rabbitmqctl Set_policy [-P Vhost] Name Pattern Definition [Priority]
-P Vhost: Optional parameter, set for queue under specified Vhost
Name of the Name:policy
Pattern:queue matching pattern (regular expression)
Definition: Image definitions, including three parts Ha-mode,ha-params,ha-sync-mode
Ha-mode: Refers to the mirror-like queue mode, valid value is All/exactly/nodes
All means mirroring on all nodes of the cluster
Exactly means mirroring on a specified number of nodes, with the number of nodes specified by Ha-params
The nodes represents mirroring on the specified node, and the node name is specified by Ha-params
Parameters required for Ha-params:ha-mode mode
Ha-sync-mode: Synchronization of messages in the mirror queue, valid values are automatic,manually
Priority: Optional parameter, policy precedence
For example, all queues with a queue name beginning with Hello are mirrored, and mirroring is done on two nodes of the cluster, the policy setting command is:
Rabbitmqctl set_policy hello-ha "^hello" ' {"Ha-mode": "Exactly", "Ha-params": 2, "Ha-sync-mode": "Automatic"} '
2. Approximate implementation of the mirroring queue
(1) General introduction
Usually the queue consists of two parts: Amqqueue_process, which is responsible for the protocol-related message processing, i.e. receiving messages from producers, delivering messages to consumers, processing messages confirm, acknowledge, and so on, and the other part is Backing_queue, It provides the relevant interface for amqqueue_process calls, the completion of the storage of messages, and the possible persistence of work.
The mirror queue is also composed of these two parts, Amqqueue_process still carries out protocol related message processing, Backing_queue is a special backing_queue composed of master node and slave node. Both the master node and the slave node are composed of a set of processes, a GM responsible for message broadcasting, and a callback handler for the broadcast messages received by GM. The callback processing on the master node is coordinator, which is mirror_queue_slave on the slave node. Mirror_queue_slave contains common backing_queue for storing messages, backing_queue in the master node is included in Mirror_queue_master by Amqqueue_ Process to invoke.
Note: Both the release and consumption of messages are done through the master node. The master node processes the message while processing the message through the GM broadcast to all the slave nodes, and the slave node's GM receives the message and passes the callback to the Mirror_queue_slave for actual processing.
(2) GM (guaranteed multicast)
Traditional master-slave replication mode: The master node is responsible for sending to all slave nodes need to copy the message, during the replication process, if there is an exception to the slave node, the master node needs to make corresponding processing; if there is a problem with the master node itself, Communication between slave nodes may determine whether this replication continues. Of course, in order to deal with various anomalies, the whole process of logging is unavoidable.
However, this is not the case in RABBITMQ, but instead, all nodes are formed into a circular list, each node will monitor the nodes located on both sides of its own, and when a node is added, the adjacent node ensures that the current broadcast message will be copied to the new node, and when a node fails, Adjacent nodes will take over to ensure that the message for this broadcast is replicated to all nodes.
These GM forms on the master node and the slave node form a group,group information that is recorded in Mnesia. Different mirror queues form different group.
After the message from the master node corresponding to the GM issued, followed by the linked list to all nodes, as all nodes form a circular linked list, the master node corresponding to the GM will eventually receive their own messages sent, this time the master node will know that the message has been replicated to all slave nodes.
(3) Important table structure
Rabbit_queue table records information about the queue:
?
123456789101112131415 |
-record(amqqueue,
{
name, %%队列的名称
durable, %%标识队列是否持久化
auto_delete, %%标识队列是否自动删除
exclusive_owner, %%标识是否独占模式
arguments, %%队列创建时的参数
pid, %%amqqueue_process进程PID
slave_pids, %%mirror_queue_slave进程PID集合
sync_slave_pids, %%已同步的slave进程PID集合
policy, %%与队列有关的policy
%%通过set_policy设置,没有则为undefined
gm_pids, %%{gm,mirror_queue_coordinator},{gm,mirror_queue_slave}进程PID集合
decorator %%
}).
|
Note: The Slave_pids storage is sorted by the time the slave is added so that when the master node fails, the "Oldest" slave node is promoted to the new master.
The Gm_group table records information about the group formed by GM:
?
123456 |
-record(gm_group, { name, %%group的名称,与queue的名称一致 version, %%group的版本号, 新增节点/节点失效时会递增 members, %%group的成员列表, 按照节点组成的链表顺序进行排序 }). |
3. Some details of the mirror queue
(1) New node
The slave node obtains all the member information of the corresponding group from the Gm_group, then randomly selects a node and sends a request to that node, which, after receiving the request, updates the gm_group corresponding information. At the same time, the left and right nodes are notified to update the neighbor information (adjust the monitoring of the left and next nodes) and the message is currently being broadcast, then reply to the notification request node successfully joined group. The node that asks to join the group receives a reply and then updates the relevant information in the Rabbit_queue and synchronizes the message as needed.
(2) broadcast of messages
Messages are emitted from the master node and are sent along the node-linked list. During this time, all slave nodes cache the message, and when the master node receives the message it sends, it broadcasts an ACK message again, and the same ACK message goes through all the slave nodes along the node list, which notifies the slave node to clear the cached message. The end of the life cycle of the corresponding broadcast message when the ACK message returns to the master node.
For a simple, a node is the master node, broadcast a message with the content "test". "1" indicates that the message is the first message of the broadcast; "Id=a" indicates that the sender of the message is Node A. On the right is the status information recorded by the slave node.
Why do all nodes need to cache a published message?
Master publishes a message that passes through all slave nodes sequentially, and at any point in time, there may be nodes that fail, and the neighboring nodes may need to be resent to the new node. For example, A->b->c->d->a forms a loop linked list, A is the master node, a broadcast message is sent to the node b,b sent to C, and if node C receives a message sent by B is not sent to D, the exception ends. After Node B senses node C fails, it needs to send the message to D again. Similarly, if the B node sends a message to C, and the e node is added to the B,c node, then the B node needs to send the message to the new e-node.
Status records for GM:
?
1234567891011121314151617181920212223242526 |
-record(state,
{
self, %%gm本身的ID
left, %%该节点左边的节点
right, %%该节点右边的节点
group_name, %%group名称 与队列名一致
module, %%回调模块 rabbit_mirror_queue_slave或者
%%rabbit_mirror_queue_coordinator
view, %%group成员列表视图信息
%%记录了成员的ID及每个成员的左右邻居节点
pub_count, %%当前已发布的消息计数
members_state, %%group成员状态列表 记录了广播状态:[#member{}]
callback_args, %%回调函数的参数信息
%%rabbit_mirror_queue_slave/rabbit_mirror_queue_coordinator进程PID
confirms, %%confirm列表
broadcast_buffer, %%缓存待广播的消息
broadcast_timer, %%广播消息定时器
txn_executor
}).
-record(member,
{
pending_ack, %%待确认的消息,也就是已发布的消息缓存的地方
last_pub, %%最后一次发布的消息计数
last_ack %%最后一次确认的消息计数
}).
|
(3) Failure of the node
When the slave node fails, it is only the neighboring node perception, then the Neighbor node information is re-adjusted, update Rabbit_queue, Gm_group records, and so on. If the master node fails, the "Oldest" slave node is promoted to the master node, and the slave node creates a new coordinator and informs the GM that the modification callback is handled as coordinator, the original Mirror_queue_ Slave acts as a amqqueue_process to process producers ' messages, deliver messages to consumers, and so on.
It is mentioned above that if the slave node fails, only the neighboring nodes can perceive, then the master node failure is also only the adjacent nodes can perceive? If this is the case, if the adjacent node is not the "oldest" node, how to notify the "oldest" node promoted to the new master node?
In fact, when all slave nodes are joined to the group, the Mirror_queue_slave process monitors the amqqueue_process process (and possibly the mirror_queue_slave process) of the master node. If the master node fails, Mirror_queue_slave will perceive it and then broadcast it via GM so that all nodes will eventually know that the master node is invalid. Of course, only the "oldest" node will elevate itself to the new master.
In addition, when the slave is promoted to master, a "rescue" is mirror_queue_slave inside, that is, the handle_call/handle_info/handle_ that originally needed a callback Mirror_queue_slave Cast and other interfaces to process the message, all changed to call Amqqueue_process Handle_call/handle_info/handle_cast and other interfaces, so as to explain the above, Mirror_queue_ The slave process acts as the processing of messages related to the Amqqueue_process completion protocol.
?
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354 |
rabbit_mirror_queue_slave.erl
handle_call({gm_deaths,LiveGMPids},From,
State = #state{q = Q = #amqqueue{name=QName,pid=MPid}})->
Self = self(),
case rabbit_mirror_queue_misc:remove_from_queue(QName,
Self,
LiveGMPids) of
{ok,Pid,DeadPids} ->
case Pid of
MPid ->
%% master hasn‘t changed
gen_server2:reply(From, ok),
noreply(State);
Self ->
%% we‘ve become master
QueueState = promote_me(From,State),
{become,
%% 改由rabbit_amqqueue_process模块处理消息
rabbit_amqqueue_process,
QueueState, hibernate};
...
gen_server2.erl
handle_common_reply(Reply,Msg,GS2State = #gs2_state{name=Name,
debug=Debug})->
case Reply of
...
{become, Mod, NState, Time1} ->
Debug1=common_become(Name,Mod,NState,Debug),
loop(find_prioritisers(
GS2State#gs2_state{mod=Mod,
state=NState,
time=Time1,
debug=Debug1}));
...
handle_msg({
‘gen_call‘
,From,Msg},
GS2State=#gs2_state{mod=Mod,
state=State,
name=Name,
debug=Debug}) ->
case catch Mod:handle_call(Msg, From, State) of
... handle_msg(Msg,GS2State=#gs2_state{mod=Mod,state=State})->
Reply = (
catch dispatch(Msg,Mod,State)),
handle_common_reply(Reply, Msg, GS2State).
dispatch({
‘$gen_cast‘
,Msg},Mod,State)->
Mod:handle_cast(Msg, State);
dispatch(Info, Mod, State)->
Mod:handle_info(Info,State).
|
(4) Synchronization of messages
What's the use of having a Ha-sync-mode attribute when configuring the mirror queue?
When a new node is added to the group, it is possible to retrieve the message content that is currently being broadcast from the left node, and the message that was broadcast before joining the group cannot be obtained. If the master node unfortunately fails at this point, and the new node has exactly become the new master, the messages that were broadcast before joining the group are all lost.
Note: The message here specifically refers to a message that has been posted and copied to all slave nodes before the new node joins, and these messages have not been consumed by the consumer or have not been identified by the consumer. If all broadcast messages are consumed and confirmed by the consumer before the new node joins, the master node deletes the message and notifies the slave node to complete the corresponding action. This is the equivalent of not releasing any messages before the new node joins.
The solution to avoid this problem is to synchronize the message with the new slave node. When Ha-sync-mode is configured for automatic synchronization (automatic), the new node automatically synchronizes the message when it joins the group, and if it is configured for manually, manual action is required to complete the synchronization.
rabbitmq--Mirroring Queue