Two-phase commit protocol Exception Handling

Source: Internet
Author: User

We are familiar with the two-phase submission protocol and explain the exception handling in each phase. First, we need to persist the state in the Protocol process, so that if the server goes down, we can still know through the log that the server is in that phase before it goes down. At the same time, all changes to the data will first write the ahead log to ensure that the data will not be lost after the downtime and restart. The log writing sequence is assumed to be: Write ahead log-Modify buffer-write commit/abort log.

On this premise, we will discuss the exception situation and handling methods based on the following sequence diagram.

Two-phase protocol submission Sequence

  1. If process a fails, the coordinator does not receive responses from some participants. After timeout, the Coordinator sends an abort message to the participant to cancel the transaction. There are two situations for participants:

    • Process 1 fails. Due to network problems, the participant does not receive the vote request message or the participant is down. After a participant restarts and recovers, no action is required.
    • Process 2 fails. The participant receives the vote request. The network problem coordinator does not receive a response or the participant goes down. When a participant recovers from downtime or waits for timeout, The decision_request message is broadcast to ask other participants if they have received the commit/abort message.
  2. Process B fails, that is, the coordinator does not receive a response from some participants after sending the commit message. The Coordinator needs to retry and confirm the message of the submitted participant. If the contact cannot be reached after multiple attempts, it will be resolved after the participant goes online. There are two situations for participants:

    • Process 3 fails. Due to network problems, the participant does not receive the commit message or the participant is down at this time. The participant found that the submission was not successful in the local log. As the local log is ready for submission, but it is unknown that the Coordinator decides to submit the request. Therefore, ask the coordinator, submit or roll back based on the response of the Coordinator. If you cannot contact the coordinator, ask other participants about the transaction status. If a node has been submitted or aborted abnormally (the Coordinator has sent the relevant message ), perform the same operation.
    • Process 4 fails. The participant completes commit/rollback, but the network problem coordinator does not receive a response or the participant goes down. The participant has already completed local submission in the local log, so the message of submission completion may not reach the Coordinator due to network failure. So ignore it directly. At this time, the Coordinator may wait for the submission of the participant to complete the response message, so the participant actively contacts the Coordinator to inform the transaction status.
  3. Process C fails, that is, after the participant sends the vote response message, the commit/rollback message of the Coordinator is not waited. The exception handling for the participants in this process has been discussed. Here we will discuss the Exception Handling for the Coordinator. There are two situations:

    • Process 2 fails. Due to network problems, the coordinator does not receive a response or the coordinator is down at this time. After the Coordinator recovers and restarts, it finds that no submission is performed and the insurance operation is performed (because it does not know whether it has sent the preparation message or whether other participants are ready to submit ), send the abort message directly to all participants and terminate the transaction.
    • Process 3 fails. Due to network problems, the participant does not receive the commit/rollback message or the coordinator is down. After the Coordinator recovers from restarting, it cannot be ensured that all participants have received the commit message. Therefore, a commit message is sent to all participants to ensure normal transaction submission.

For more information about the pseudo-code of the algorithm, see the following code, from distributed systems: Principles and paradigms.

Actions of Coordinator

write("START_2PC tolocal log");multicast("VOTE_REQUESTto all participants");while(not all votes have been collected){  waitfor("any incoming vote");  if(timeout)  {    write("GLOBAL_ABORT to local host");    multicast("GLOBAL_ABORT to all participants");    exit();  }  record(vote);}if(all participants send VOTE_COMMIT and coordinatorvotes COMMIT){  write("GLOBAL_COMMIT to local log");  multicast("GLOBAL_COMMIT to all participants");}else{  write("GLOBAL_ABORT to local log");  multicast("GLOBAL_ABORT to all participants");}**Actions of Participants**write("INIT to locallog");waitfor("VOTE_REQUESTfrom coordinator");if(timeout){  write("VOTE_ABORT to local log");  exit();}if("participantvotes COMMIT"){  write("VOTE_COMMIT to local log");  send("VOTE_COMMIT to coordinator");  waitfor("DESCISION from coordinator");  if(timeout)  {    multicast("DECISION_REQUEST to other participants");    waituntil("DECISION is received"); /// remain blocked    write("DECISION to local log");  }  if(DECISION == "GLOBAL_COMMIT")  {    write("GLOBAL_COMMIT to local log");  }  else if(DECISION== "GLOBAL_ABORT")  {    write("GLOBAL_ABORT to local log");  }}else{    write("GLOBAL_ABORT to local log");    send("GLOBAL_ABORT to coordinator");}
References:

[1] Two-phase commit protocol (2 PC), http://blog.chinaunix.net/uid-20761674-id-75164.html

Link: http://cxh.me/2014/07/07/two-process-commit-exceptions/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.