Paxos in keyspace

Source: Internet
Author: User
Document directory
  • 1. keyspace
1. keyspace

Keyspace is an open-source key-value Database Based on paxos. The underlying storage is based on berkelydb. The core function of keyspace is

A Consistency layer is added to the berkelydb to ensure that the data of each node is completely consistent. Keyspace is based on the master-slave mode. All writes are undertaken by the master,

It is also transmitted to slave through paxos, And the read can be routed to the master or slave according to the basic route. Therefore, when the master is down or inaccessible, a set

Master election mechanism, which becomes the paxoslease Algorithm in keysapce. By default, the master has a lease of 7 seconds, but as long as the master does not

Crash can be held.

It should be pointed out that, based on the master, all keyspaces have transformed the paxos implementation, and the paxos of 2 phase is transformed into 1 phase.

2. keyspace Configuration

Keyspace provides many ports and services, as follows:

Item Port
HTTP service 8080
Non-HTTP client 7080
Internal Data Replication 10000
Master lease 10001
Catch up 10002

Catch up is a situation where a node is added after it leaves for a long time.

3. keyspace Architecture

The basic architecture of keysapce is to add a paxos layer on the berkelydb, as shown in the figure below:

Paxos and paxoslease belong to the paxos layer. paxoslease tends to solve the leader election problem. paxos is mainly used to solve data consistency. We focus on paxos.

In the paxos implementation, keysapce has several more detailed components:

  • Keyspacedb: a consistent operation interface for external protocols that hides calls to paxos and berkelydb.
  • Replicatedlog: Logical dB log, which encapsulates paxos consistency.
  • Paxosproposer: The publisher in the paxos Algorithm
  • Paxosacceptor: The acceptor in the paxos Algorithm
  • Paxoslearn: learn in the paxos Algorithm

Each node in the keysapce is composed of three roles: proposer, acceptor, and learn, with the unified coordination of replicatedlog.

In addition, keyspace uses the TCP protocol, and other non-core modules and functions are not described.

4. paxos implementation

Keyspace uses the master-based paxos algorithm. As long as the master node does not have a crash, all write requests are directed to Proposer on the master node.

And write it to all slave in a consistent manner. Because of the existence of the master, keyspace simplifies the prepare and accept phases of the traditional paxos algorithm.

This is only one phase of the prepare, but keyspace implements conceptual redefinition of the two processes of the prepare phase:

  • Propose: The Master proposes proposal
  • Prepare resposne: acceptor's response to Proposer

If the proposal submitted by the master is rejected due to network delay or other reasons, the master must increase the proposal number to continue the submission:

  • Prepare: The Master will continue to submit the proposal after providing the number.
  • Prepare response: corresponding response

In the past, proposal in the paxos algorithm was an abstract concept, but it is very specific in keyspace, that is, the key-value pair. So how does paxos ensure consistency? Each time the user submits key-value and keyspace, the following process is performed:

  1. Replicatedlog transmits the received key-value to the master proposer.
  2. The master assigns paxosid (instance id) to the current (K, V) and sends it as a propose message to other nodes.
  3. Previously, the majority of other nodes responded to the accept message, and the master sent the (K, v) connection paxosid together to all learn (other nodes)
  4. After receiving a message from the master node, learn checks whether all paxosid before paxosid in the message has been learned. That is, ensure that (K, v) messages are in order.

    Run, cannot skip. If you have already learned this, the (K, v) corresponding to the current paxosid will be persistent to berkelydb; otherwise, learn will request the master to learn

    The message corresponding to the previous paxosid.

Learning messages by serial number is the core of keyspace to ensure consistent replication. After learn receives the message (MSG), perform the following operations:

  • If msg. paxosid = Local. paxosid + 1, this directly executes the (K, V)
  • If local. paxosid <MSG. paxosid, you must first learn [local. paxosid + 1, MSG. paxosId-1] corresponding to (k, V), and then in the execution MSG. the (K, v) corresponding to paxosid has ensured consistency. This process is called a skip sign.

To save space, the master caches a portion of the sent paxosid, that is, the latest data, such as [minpaxosid, maxpaxosid]:

  • When the master receives the hop number request, it needs to query the paxosid that has been sent in the master local cache:

    • If minpaxosid <= msg. paxosid <= maxpaxosid
    • Otherwise, the master responds to learn, indicating that it should catch up. Catch up means that a large amount of data needs to be transmitted from the master. It may be all data to ensure that learn is consistent with the master.
  • After learn receives the catch up message, it immediately starts the catch up process and copies all the data on the master to ensure consistency.

Persistence

Both cache paxosid and final data of keyspace are persisted to berkelydb, which ensures the correctness of the algorithm after node restart.

Fault Tolerance

Keysapce makes good persistence and fully believes that berkelydb can solve many failures such as network and downtime, but it cannot be processed when berkelydb fails, in practice, you can refer to its catch up implementation

Problem

Keyspace is characterized by that all writes go through the master, and all node data is consistent with the master, but in reality, what is needed is dynamic partitioning and replication, instead of completely consistent data on all nodes, as keyspace claims, its goal is to build a distributed underlying layer rather than terminal products.

5. Master election

Obviously, there is another problem to be solved is how to choose a new master after the master is down, please refer to: http://blog.csdn.net/chen77716/archive/2011/03/21/6265394.aspx

6. Questions

After keyspace makes drastic changes to the paxos algorithm, we should not only ask whether this is still the paxos algorithm? Without considering the master election, this is obviously a consistency problem with the master to ensure.

Yes, but because of network latency, node restart, disk failure, and many other situations, it is difficult to ensure consistency only by the master without the voting mechanism of the majority, especially in the case of errors.

Therefore, the paxos algorithm after the transformation is correct. Google chubby once said that any distributed consistency algorithm is a special case of paxos, which is also one of the special cases.

In fact, the native paxos algorithm is very difficult to implement in the project without any transformation, because of its performance problems, Microsoft also said. Therefore, you don't have to worry about whether the paxos algorithm is of pure lineage. You just need to grasp the essentials and apply them flexibly. paxos is not a tangible sword, but an invisible gas. It depends on how you control it, instead of being controlled by it.

Author: chen77716 posted on 17:11:00 Original article link Read: 251 comment: 0 view comment

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.