Zookeeper Series II: Detailed distributed architecture, distributed technology, distributed transactions

Source: Internet
Author: User
Tags log log prepare rollback zookeeper
I. Distributed architecture detailed 1, Distributed development History 1.1 single point centralized

Features: The APP, DB, and Fileserver are all deployed on a single machine. and the number of access requests is low

1.2 Application services and data service splitting

Features: Apps, DB, and Fileserver are deployed separately on separate servers. and the number of access requests is low

1.3 Using caching to improve performance

Features: frequently accessed data in the database is stored in the cache server, reducing the number of database visits, reducing the pressure on the database

1.4 Application Server Clusters

Features: multiple application servers provide service through load balancing at the same time, solve the problem of single server processing capacity limit

1.5 database read/write separation

Features: database for reading and writing separation (master-slave) design, to solve the database processing pressure

1.6 Reverse Proxy and CDN acceleration

Features: speed up system access with reverse proxy and CDN

1.7 Distributed File system and distributed database

Features: database using distributed database, file system using Distributed File system

With the development of the business, the final database reading and writing separation will not meet the requirements, need to adopt a distributed database and distributed file system to support

Distributed database is the last method after the database is split, only used when the scale of single table is very large, the more commonly used database splitting means is the business sub-Library, and the different business databases are deployed on different machines.

Second, the distributed technology detailed 1. Concurrency of 2. Distribution of

Large tasks split into multiple tasks deployed to multiple machines for external service

3. Lack of global clocks

Time to unify

4. Equivalence

A service is deployed on more than one machine, without any difference.

5. Failure is sure to happen

Hard drive bad CPU burned ....

Three, distributed transaction 1. ACID

atomicity (atomicity): All operations in a transaction (transaction) are either complete or not complete and do not end up in the middle of a link. An error occurs during the execution of a transaction and is restored (Rollback) to the state before the transaction begins, as if the transaction had never been executed.
Consistency (consistency): the integrity of the database is not compromised until the transaction begins and after the transaction has ended. This means that the data being written must fully conform to all of the preset rules, which include the accuracy of the data, the concatenation, and the subsequent database's ability to perform the scheduled work spontaneously.

For example A has 500 yuan, B has 300 yuan, a to B transfer 100, no matter what, a and B sum always 800 yuan
Isolation (Isolation): The ability of a database to read and write and modify its data concurrently with multiple concurrent transactions prevents inconsistencies in data resulting from cross-execution when multiple transactions are executing concurrently. Transaction isolation is divided into different levels, including read UNCOMMITTED, Read Committed, REPEATABLE READ (repeatable Read), and serialization (Serializable).
Persistence (Durability): After the transaction is finished, modifications to the data are permanent, even if the system failure is not lost.

2.2p/3p

2p=-Phase Commit two commits (RDBMS (relational database management system) is often the mechanism to ensure strong consistency)

3p= three Phase commit three

Description:2p/3p is to ensure the acid of the transaction (atomicity, consistency, isolation, persistence)

two stages of 2.1 2P

Phase 1: Commit a transaction request (polling phase) asking if a transaction can be committed

Phase 2: Performing a transaction commit (Commit, rollback) True COMMIT Transaction

Three stages of the 2.2 3P

Phase 1: Commit-ask if a transaction can be committed
Stage 2: Pre-commit-pre-COMMIT Transaction
Phase 3: Performing a transaction commit (Commit, rollback) True COMMIT Transaction

Description: 3P Split the phase of 2P into the first two stages

3. Cap theory

Consistency (consistency): Consistent data in distributed databases

Availability (availability): Any node is hung and other nodes can continue to provide services externally

Partition Fault tolerance (network partition) Partition tolerance: A database where the machine is broken, such as the hard drive is broken, the data is lost, you can add a machine, and then from other normal machines to synchronize the backup data.

The characteristics of the CAP theory: Cap can only meet 2 of these

CA (Discard P): Put all the data on one node. Meet consistency, availability.
AP (Discard C): Give up strong consistency and guarantee with final consistency.
CP (Abort a): Once the system encounters a failure, the affected server waits for a period of time and is unable to provide services externally during recovery.

To illustrate the CAP theory:

There are 3 machines with 3 databases, each with two tables, and the data are the same.
Machine1-db1-tbl_person, Tbl_order
Machine2-db2-tbl_person, Tbl_order
Machine3-db3-tbl_person, Tbl_order
1) when inserting data into Machine1 's DB1 table Tbl_person, Tbl_order, and simultaneously synchronizing the inserted data to Machine2, Machine3, This is the consistency
2) When one of the machines goes down, it can continue to provide services, restart the machine to continue service, This is the availability of
3) When the machine1 machine is broken, the data is all lost, there is no problem, because there is data on Machine2 and Machine3, re-add a machine machine4, Synchronize the backup data of Machine2 and Machine3 one of the machines, which is the partitioning fault tolerance

4. Base theory

Basic available (bascially available), soft (soft state), final consistency (eventually consistent)
Basic availability: in the event of a distributed system failure, the loss of partial availability (service demotion, page demotion) is allowed
Soft State: allows the distributed system to appear in the middle state. And the intermediate state does not affect the availability of the system.
The intermediate state here refers to the eventual consistency of the delay between data updates between different replication
As in the case of the CAP theory, when inserting data into the table Tbl_person, Tbl_order to Machine1 's DB1, and simultaneously synchronizing the inserted data to Machine2 and Machine3, the synchronization fails when there is a problem with the MACHINE3 network. But after a while the network is restored and the synchronization is successful, the state of the failed synchronization is called the soft state, because eventually the synchronization succeeded.
Final Consistency:data replications is consistent over time.

5. Paxos Algorithm 5.1 Introduction to Paxos algorithm Let's start with a little story.

Question of the Byzantine general

The Byzantine Empire was the Eastern Roman Empire of the 5~15 century, and Byzantium is now Istanbul, Turkey. We can imagine that the Byzantine army had many branches, stationed outside the enemy city, and each branch was commanded by its own generals. Assuming there are 11 generals, the generals can only rely on correspondents to communicate. After observing the enemy, loyal generals must develop a unified plan of action-offensive or retreat. However, these generals have traitors who do not want loyal generals to reach agreement, thus affecting the formulation and dissemination of a unified plan of action.
The question is: The generals must have an agreement that will enable all loyal generals to agree, and a few traitors will not be able to make the loyal generals make the wrong plan-to make some generals attack and other generals to retreat.
Assuming there are 9 loyal generals, 5 are judging the offense, 4 are judging the retreat, and 2 spies are judging the retreat maliciously, although the result is a false retreat, this situation is entirely permissible. Because the 11 generals still maintain a state of consistency.

Summarize:
1) 11 generals attacking the castle
2) at the same time attack (motion, resolution), while retreating (motion, resolution)
3) No matter retreat or offense, half of the generals must unify their opinions before they can execute them.
4) There is a traitor in the general, will interfere with the resolution generation

5.2 Let's introduce the Paxos algorithm.

Google Chubby author Mike Burrows said there is only one consistency algorithm in the world, that is Paxos, the other algorithms are defective.

Paxos: Majority resolution (eventual resolution of consistency issues)

The Paxos algorithm has three roles: Proposer,acceptor,learner

Proposer: Submitter (author of proposal)

Submit a motion (judging if it is more than half) and submit a bill of approval (judging if half of it)

Acceptor: recipient (Bill recipient)

Accept the motion or dismiss the motion and give proposer a response (promise)

Learner: learner (soy sauce)

If the motion arises, study the motion.

setting 1: If Acceptor does not accept the motion, then he must accept the first motion.

Setting 2: Each motion must have a number, and the number can only grow and cannot be duplicated. The farther back the bigger.

Setting 3: acceptance of a large number of motions, if less than the previous acceptance of the bill number, then do not accept

Setting 4: There are 2 types of motions (motions submitted, approved motions)

1) Prepare phase (submission of motions)

A) Proposer hope the motion v. First issue prepare requests to most acceptor. Prepare request content is serial number k

b) Acceptor received the prepare request for the number k, check whether there is a request to handle the prepare in their hands.

c) If Acceptor has not received any prepare request, then use OK to reply to proposer, on behalf of Acceptor must accept the first motion received ( set 1)

D) Otherwise, if Acceptor has received any prepare request before (e.g. MAXN), then compare the bill number, if K<MAXN, then reject or error reply proposer

e) If K>=MAXN, then check whether there are approved motions, if not then use OK to reply to proposer, and record K

f) If K>=MAXN, then check whether there are approved motions, if any, reply to the approved motion number and the contents of the motion (such as: <ACCEPTN, Acceptv>, ACCEPTN for the approved motion number, ACCEPTV as the content of the approved motion)

2) Accept phase (approval phase)

A) Proposer received more than half of acceptor reply, reply is OK, and there is no approved bill number and the content of the bill. Then proposer continues to submit the approval request, but at this point the bill number K and the motion content V are submitted together (<K, v> this data form)

b) Proposer received more than half of acceptor reply, reply is OK, and with the approved bill number and the content of the motion (<pok, the motion number, the content of the Bill >). So proposer found more than half of all replies (assuming <pok,AcceptNx,AcceptVx>) sent to Acceptor as a submit approval request (Request for <K,AcceptVx>).

c) Proposer did not receive a reply from more than half acceptor, change the bill number k to K+1, and resend the number to acceptors (the process of repeating the prepare phase)

D) Acceptor received an accept request from proposer, if the number K<MAXN does not respond or reject.

e) Acceptor received the acceptance request from proposer, if the number K>=MAXN approve the motion, and set the motion approved by the hand to <k, accept the number of the motion, accept the contents of the motion, and reply to proposer.

f) After a period of time proposer compared to receive reply, if more than half, then the end process (on behalf of the bill is approved), and inform leaner can study the motion.

g) After a period of time proposer compared to receive reply received in hand, if not more than half, then amend the bill number re-enter the prepare phase.

5.3 Paxos Sample Example 1: Scenarios that have been proposed

Role:

Proposer: Staff Officer 1, Staff Officer 2

Acceptor: General 1, General 2, General 3 (decision maker)

1) The Staff Officer 1 initiated the proposal to send a letter to the 3 generals, with the contents (number 1);
2) 3 Generals received the proposal of staff 1, because no number has been saved before, so put (number 1) to save, avoid forgetting, at the same time let the communicator back with the letter, the content is (OK);
3) The staff Officer 1 received a reply from at least 2 generals, and sent a letter to the 3 generals again, with the contents (number 1, attack time 1);
4) 3 Generals receive staff officers 1 of the time, the (number 1, attack time 1) to save, to avoid forgetting, at the same time let the communication soldiers with the letter back, the content is (Accepted);
5) Staff Officer 1 received at least 2 generals (Accepted) content, confirmed that the attack time has been received by everyone;
6) The Staff Officer 2 initiated the proposal to send a letter to the 3 generals, with the contents (number 2);
7) 3 Generals receive the proposal of staff 2, because (number 2) is larger than (number 1), so put (number 2) to save, to avoid forgetting, and because previously accepted the proposal of Staff Officer 1, so let the communication soldiers with the letter back, the content is (number 1, attack time 1);
8) The Staff Officer 2 received a reply from at least 2 generals, and since the reply brought the proposed content of the accepted Staff Officer 1, Staff 2 no longer proposed a new offensive time and accepted the time proposed by Staff Officer 1;

Example 2: Cross-scene

Role:

Proposer: Staff Officer 1, Staff Officer 2

Acceptor: General 1, General 2, General 3 (decision maker)

1) The Staff Officer 1 initiated the proposal to send a letter to the 3 generals, with the contents (number 1);

2) The situation of the 3 generals is as follows
A) General 1 and General 2 have received the proposal of Staff Officer 1, General 1 and General 2 to record (number 1), if there are other staff officers to propose a smaller number, will be rejected, and let the communicator back with the letter, the content is (OK);
b) The communications officer responsible for informing the General 3 was arrested, so General 3 confiscated the proposal of staff 1;

3) The Staff Officer 2 also initiated the proposal at the same time, sending a letter to the 3 generals, with the contents (number 2);
4) The situation of the 3 generals is as follows
A) General 2 and General 3 have received the proposal of Staff Officer 2, General 2 and general 3 to record (number 2), if there are other staff officers to propose a smaller number, will be rejected, and let the communicator back with the letter, the content is (OK);
b) The communications officer responsible for informing the General 1 was arrested, so general 1 confiscated the proposal of staff 2;
5) The Staff Officer 1 received a reply from at least 2 generals and sent a letter to the 2 generals with the reply, with the contents (number 1, attack time 1);
6) The situation of the 2 generals is as follows
A) General 1 received (number 1, attack time 1), and their own saved number is the same, so put (number 1, attack time 1) to save, and let the communication back with the letter, the content is (Accepted);
b) General 2 received (number 1, attack time 1), because (number 1) is smaller than already saved (number 2), so let the communicator bring back, content for (rejected, number 2);
7) The Staff Officer 2 received a reply from at least 2 generals and sent a letter to the 2 generals with the reply, with the contents (number 2, attack time 2);
8) General 2 and General 3 received (number 2, attack time 2), and their own saved number is the same, so put (number 2, attack time 2) to save, and let the communication back with the letter, the content is (Accepted);
9) Staff Officer 2 received at least 2 generals (Accepted) content, confirmed that the attack time has been accepted by the majority;

10) Staff Officer 1 received the contents of 1 Generals (Accepted) and received one (rejected, number 2); Staff 1 re-sponsored the proposal to send a letter of communication to the 3 generals with the contents (number 3);

11) The situation of the 3 generals is as follows
A) General 1 received the proposal of the staff Officer 1, because (number 3) is greater than the previously saved (number 1), so the (number 3) is saved; Since General 1 has accepted the recommendation of the Staff Officer 1, so that the communication soldiers with the letter back, the content is (number 1, attack time 1);
b) General 2 received the proposal of the staff Officer 1, because (number 3) is greater than the previously saved (number 2), so the (number 3) is preserved; Since General 2 has accepted the proposal of the staff Officer 2, so that the communication soldiers with the letter back, the content is (number 2, attack time 2);
c) to inform the general 3 of the communications soldiers were caught, so general 3 confiscated the proposal of staff 1;

12) The staff Officer 1 received a reply from at least 2 generals, compared the number of two replies, and selected the offensive time of the large number as the latest proposal; Staff 1 sent the letter to the 2 generals with the reply, with the contents (number 3, attack time 2);
13) General 1 and General 2 received (number 3, offense time 2), and their own saved number is the same, so save (number 3, Attack time 2), and let the communicator back with the letter, the content is (Accepted);
14) Staff Officer 1 received at least 2 generals (accepted) content, confirming that the attack time has been accepted by the majority.

Four. Zookeeper Zab protocol

Zookeeper automic Broadcast (ZAB), or Zookeeper atomic broadcast, is the Paxos Classic Implementation

Terms:

Quorum: A collection of more than half the cluster

1. Nodes in ZAB (zookeeper) are divided into four states

looking: election leader status (Crash recovery status)

following: the state of the follower (follower), obeying the leader command

leading: The current node is leader, responsible for coordinating the work.

Observing:Observer (Observer), not participating in elections, read-only nodes.

2. Two modes in Zab (how ZK is elected)

Crash recovery, message broadcast

1) Crash recovery

Leader hung up, need to elect a new leader

A. Each server has a ballot <myid,zxid>, such as (3,9), vote for himself.
B. After each server has cast itself, it will then be cast to other servers that are also available. If the Server3 (3,9) were cast to Server4 and Server5 respectively, one analogy
C. Comparative voting, comparative logic: Priority comparison Zxid,zxid the same when compared to myID. Compare Zxid, big do leader, compare myID, small do leader
D. Changing the server state (crash recovery--data synchronization, or crash recovery--message broadcast)

Related Concepts Supplementary Description:

Epoch Period value

Acceptedepoch (analogy: era name): Follower has accepted leader Change era name's (Newepoch) offer.

Currentepoch (metaphor: Current era name): Current era name

Lastzxid:history recently received ZXID (maximum value)

History: Current node accepts log of transaction proposal

ZXID Data Structure Description:

Czxid = 0x10000001b

64-bit data structure

High 32-bit: 10000

The combination of the leader cycle number +myid

Low 32-bit: 001b

The self-increment sequence of a transaction (a monotonically increasing sequence) as long as the client has a request, +1

When generating a new leader, remove the maximum transaction zxid from the local log from the leader server, read epoch+1 from the inside, as a new epoch, and place a low 32-bit 0 (guaranteed ID absolute self-increment)

2) message broadcast (similar to 2P submissions)

A.leader accepts the request and assigns the request to the global unique 64-bit self-increment ID (ZXID).
B. Issue Zxid as a bill to all follower.
C. After all the follower have accepted the motion and want to write the motion to the hard drive, immediately reply leader an ACK (OK).
D. When leader receives a valid number (half) Acks,leader sends a commit command to all follower.
E.follower executes the commit command.
Note: At this stage, the ZK cluster is not officially available, and leader can broadcast messages, and if a new node joins, it needs to be synchronized.

3) Data synchronization

A. Remove the leader maximum Lastzxid (from the local log log)
B. Find the corresponding ZXID data and synchronize (the data synchronization process ensures all follower are consistent)
C. Only meet quorum synchronization complete, quasi-leader can become a real leader

Reference article:

Paxos Protocol Super Detailed explanation + Simple example

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.