Based on Ledisdb, talk about the implementation of distributed replication

Last Update:2014-10-09 Source: Internet

Author: User

Tags failover uuid

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

For children who use SQL or NoSQL, replication is a topic that can be avoided, and through replication, it is a great way to keep your data safe. After all, everyone knows, do not put eggs in a basket, in the same vein, do not put the data on a machine, or the machine when the machine you will happy.

In the distributed environment, it is very difficult for any data storage system to achieve a good replication mechanism, after all, the limitation of CAP is there, we can not achieve a perfect replication mechanism, only according to the actual situation of our system to design and the choice of CAP.

For more detailed instructions and explanations for replication, it is recommended

Distributed Systems

For fun and profit

, later, I will according to the actual situation of LEDISDB, detailed description of my ledisdb inside the use of replication is how to achieve.

BinLog

At the very beginning, ledisdb used a replication mechanism similar to the MySQL Universal Binlog, which is to determine the data that needs to be synchronized through the Binlog filename + position. This approach is very simple to implement, but there are still some deficiencies, mainly in the case of hierarchical replication if the master, choose the appropriate slave to ascend to master is more difficult. For the simplest example, suppose A is master,b,c for slave, and if A is dropped, we will select the one with the most synchronization data in the B,c, but which one? This problem is also encountered in MySQL's replication.

MySQL GTID

After MySQL 5.6, the concept of Gtid (Global transaction ID) was introduced to solve the above problem, which means Source:ID to represent a unique transaction in Binlog. Source is the current server's UUID, which is globally unique, and the ID is the transaction ID within the server (with an increment guarantee unique). Specifically to the above problem, using Gtid, if a when dropped, we only need to find in the binlog of B and C to compare the size of the transaction ID of the last a UUID, for example, B is uuid:10, and C is the UUID : 30, then we will choose C as the new master.

Of course, the use of Gtid also has restrictions, such as slave must also write binlog, but it is still strong enough to solve the early MySQL replication a big stand of the thorny problem. But Ledisdb is not ready to use, mainly because the application scenario is not so complicated, what I need is a simpler solution.

Google Global Transaction ID

Prior to MySQL's Gtid, a MySQL version of Google had already used the global transaction ID, which, in Binlog, used a group ID to uniquely mark any transaction. The group ID is a global incrementing ID, and the master is responsible for maintaining the build. When Master is gone, we just need to see who has the largest group ID in the binlog of slave, then that one can be chosen as master.

As you can see, this scheme is very simple, but more restrictive, such as the slave end of the binlog can only be written by replication thread, does not support multi-masters, does not support circular replication and so on. But I think it is simple and efficient enough, so ledisdb is ready to refer to it for implementation.

Raft

The distributed children's shoes should be more or less exposed to Paxos (at least I'm not fully understand), and raft is a much simpler than Paxos distributed consistency algorithm.

Raft through replicated log to achieve consistency, false with A,b,c three machines, a for leader,b and C for follower, (in fact, the concept of Master and slave). Any updates to a, you must first write to log (each log has a logid, uniquely labeled, globally incremented), then synchronize its log to follower, and finally submit the update on a. If a when dropped, B and C re-elect, if which machine the current Logid largest, then becomes leader. See here, is there a very familiar feeling?

Ledisdb in support of consensus replication, referring to the relevant practices of raft.

noun explanation

Before detailing the implementation of LEDISDB replication, it is necessary to explain some key fields.

The Logid:log is the only indication that the master is responsible for generating maintenance, and the global increment.
Lastlogid: The latest logid of the current program, which is the log that records the last update.
Firstlogid: The oldest logid of the current program, the previous log has been cleared.
Commitid: The current program has processed log execution. For example, the current Lastlogid is 10, and Commitid is 5, then there are 6,7,8,9,10 these logs need to be processed. If Commitid = Lastlogid, the certification program is up to date and no more log is required.

Ledisdb Replication

LEDISDB's replication implementation is simple, still the above example, A,b,c three machines, a for master,b and C for slave.

When Master has any updates, master does the following things:

Log the update to Log,logid = Lastlogid + 1,lastlogid = Logid
Synchronize the log to slaves, wait for slaves acknowledgement to return, or timeout
Submit Updates
Update Commitid = Logid

The above also takes into account the situation of error handling.

If 1 fails, the error log is logged, and then we assume that the update operation failed and returned directly.
If 3 fails, does not update Commitid returns because this time Commitid is less than lastlogid,master into read only mode, replication thread attempts to execute log, and if successful, updates Commitid, into a writable mode.
If 4 fails, ditto, because LEDISDB uses the log format of the row-base format, so an update operation can be executed idempotent.

for slave

In the case of the first synchronization, enter full sync mode:

Master generates a snapshot that is sent to slave along with the current lastlogid.
After slave receives the dump file, load loads and updates Commitid as the lastlogid inside the dump file.

Then enter the incremental sync mode, and if the slave already has the relevant log, go directly to the Incremental sync mode.

Under incremental mode, slave sends the sync command to master, and the sync parameter is the next log that needs to be synchronized, Logid = Commitid + 1 If slave currently has no binlog (such as the full synchronization mentioned above), otherwise logid = Lastlogid + 1.

After master receives the sync request, it has the following processing conditions:

Sync Logid Less than Firstlogid,master does not have the Log,slave receive the error back into full sync mode.
Master has the log of the sync, then sends the log to Slave,slave to save it, and sends sync again to get the next log, and the request is also used as an ACK to tell master to synchronize the log successfully.
Sync's log ID is already greater than Lastlogid, indicating that the state of master and slave have reached a consistent, no log can be synchronized, slave will wait for the new log until the timeout to send sync again.

On the slave side, for the received log, the replication thread is responsible for executing and updating the Commitid.

If Master is a machine, we just need to select the slave with the maximum lastlogid as the new master.

Limitation

Overall, this set of replication mechanisms is simple and easy to implement, but there are still many limitations.

Multi-master is not supported because there can only be one place for global logid generation at the same time. But I really rarely see the architecture pattern of multi-master, even inside MySQL.
Log IDs that do not support Circular-replication,slave writes are not allowed to be smaller than the current lastlogid so that only the most recent log is synchronized.
There is no automatic master election mechanism, but I think it is better to put it outside.

Async/sync Replication

Ledisdb is a synchronous replication that supports strong consistency, and if this mode is configured, Master waits for the slave to complete the log before committing the update, so that we can guarantee that when Master is in the machine, There must be a slave with the same data as master. However, in practice, because of the network environment or slave, master can not wait for the slave to complete the log synchronously, so there is usually a timeout mechanism. From this point of view, we still cannot guarantee strong data consistency, but we can still achieve final consistency.

Using the synchronous replication mechanism can greatly reduce the write performance of master, and if the business is not sensitive to data consistency, the asynchronous replication is actually used.

Failover

Ledisdb now there is no automatic failover mechanism, after master, we still need manual intervention to select the appropriate slave (with the largest lastlogid that), promoted to master, and the other slave back to the master. Subsequent considerations are handled using an external keeper program. For the single point of keeper, consider using raft or zookeeper to handle the problem.

Postscript

Although LEDISDB now supports replication, it still needs to be well tested in a production environment.

Ledisdb is a high-performance nosql with Go, and the interface is similar to Redis, which is now used in production environments and is welcome to use.

Based on Ledisdb, talk about distributed replication implementation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More