In Redis, users can have one server replicate (replicate) another server by executing the slaveof command or setting the slaveof option, which we call the replicated server as the primary server (master), The server that replicates the primary server is referred to as the slave server (slave).
Suppose you now have two Redis servers with addresses 127.0.0.1:6379 and 127.0.0.1:12345, if we send the following command to the server 127.0.0.1:12345:
127.0.0.1:12345> slaveof 127.0.0.1 6379
Ok
Then the server 127.0.0.1:12345 will become the 127.0.0.1:6379 from the server, and the server 127.0.0.1:6379 will become 127.0.0.1:12345 's primary server.
(Remember to add some operations to http://redisdoc.com/topic/replication.html)
This article is in accordance with the "Redis Design and Implementation," the book is organized, feel the original book is very good, so the following part of the knowledge will be in accordance with the logic of the original book to introduce:
Let's first describe how inefficient the legacy replication function will be when it is reconnected from the server after it has been disconnected. The new copy feature is how to solve the inefficient problem of legacy replication with partial resynchronization, and explains how partial resynchronization is implemented.
Implementation of Legacy Replication features
The Redis replication feature consists of sync (sync) and command propagate two operations:
- The synchronization operation is used to update the database state from the server to the current database state of the primary server;
- The command propagation operation is used when the database state of the primary server is modified, causing the database state of the master-slave server to be inconsistent, and the database of master and slave servers back to a consistent state.
Synchronous
When the client sends the SLAVEOF command from the server, requiring that the primary server be replicated from the server, the server first needs to perform a synchronous operation, that is, updates the database state from the server to the current state of the database where the primary server is currently located.
The synchronization operation from the server to the primary server needs to be done by sending the sync command to the primary server, following the steps of the Sync command:
- Sends the sync command from the server to the primary server;
- The master server that receives the Sync command executes the bgsave command, generates an RDB file in the background, and uses a buffer to record all the write commands that are executed from now on;
- When the master server's Bgsave command finishes executing, the master server sends the RDB file generated by the Bgsave command to the slave server, receives and loads the Rdb file from the server, and updates its own database state to the database state of the primary server when it executes the Bgsave command .
- The primary server sends all the write commands that are logged in the buffer to the slave server, executes these write commands from the server , and updates its own database state to the current state of the primary server database.
Command propagation
After performing the synchronization operation, the database state between the master and slave servers is the same. However, this state is not immutable, if the server performs a write operation, then the primary server database state will be modified, and cause the master and slave server state is no longer consistent.
So in order for the master-slave server to return to the same state again, the primary server needs to perform command propagation operations from the server: The primary server will execute its own write command, that is, the write command that caused the master-slave server to be inconsistent, sent to the slave server, when the same write command was executed from the server, The master-slave server will return to a consistent state again.
Defects in legacy copy functionality
In Redis, replication from the server to the primary server can be divided into the following two scenarios:
- Initial replication: From the server has not previously replicated any primary server, or from the server is currently replicating the primary server and the last replicated primary server is different;
- Repeat after disconnection: the master-slave server in the command propagation phase interrupts replication for network reasons, but re-connects to the primary server via automatic reconnection from the server and continues to replicate the primary server.
For initial replication, the legacy replication feature works well, but for re-copying after a wire break, the Legacy replication feature allows the master and slave servers to return to a consistent state, but is inefficient.
We give an example to illustrate:
From the server finally reconnect to the primary server, because the state of the master and slave server is no longer consistent, so from the server will send the sync command to the primary server, and the master server will contain the key K1 to key k10089 the Rdb file sent to the slave server, The Rdb file is received and loaded from the server to update its own database to the current state of the primary server database.
The example given above may be a bit idealistic, because during a master-slave server disconnection, the primary server can execute more than two or three write commands. In general, however, the less time the master-slave server disconnects, the fewer write commands the primary server executes during a wire break, and the smaller the amount of data that is generated by executing a small number of write commands than the entire database, in which case, in order to make up a small fraction of the missing data from the server, It is very inefficient to have the master-slave server perform a sync command again.
Sync command is a very resource-intensive operation
The sync command is very resource intensive because the master-slave server needs to perform the operation each time the sync command is executed:
- The primary server needs to execute the bgsave command to generate the Rdb file, which consumes a large amount of CPU, memory, and disk I/O resources from the primary server;
- The primary server needs to send its own generated RDB file to the slave server, which consumes a large amount of network resources (bandwidth and traffic) from the master and slave servers, and has an impact on the time the primary server responds to command requests;
- The slave server that receives the RDB file needs to load the Rdb file from the master server, and during onboarding, the slave server will not be able to handle the command request because of blocking .
Sync is a resource-intensive command, so Redis is best to perform the sync command when it's really needed.
Implementation of the new copy feature
In order to solve the inefficient problem of the legacy copy function when dealing with a broken line repetition, Redis starts with the 2.8 release and uses the PSYNC command instead of the Sync command to perform the synchronous operation at copy time.
The Psync command has both full-resynchronization and partial-resynchronization (partial resynchronization) modes:
- Where full resynchronization is used to process the initial replication: the execution steps of full resynchronization and the steps of the Sync command are basically the same, they are synchronized by having the master server create and send an RDB file, and send a write command that is stored in the buffer from the server;
- While partial resynchronization is used to deal with the re-replication after disconnection: When the primary server is reconnected after disconnection from the server, if the condition permits, the primary server can send a write command that executes during a master-slave connection disconnection to the slave server as soon as the server receives and executes these write commands. You can update the database to the state where the primary server is currently located.
Let's take one example to see what happens when you use Psync to handle disconnection:
Demonstrates the communication process of a master-slave server while performing partial resynchronization.
Actually see here when the heart still have a question: if the above example is T3 time from the server dropped, and then in the T10093 when the connection or longer time it!!! It's better to just come up with a sync command if you're going to transfer it in a single instruction. So in my opinion when using Psync jining operation, when the partial resynchronization, when all resynchronization is a policy issue. Of course, Redis will solve this problem, so let's continue watching 0_0.
implementation of partial resynchronization
The partial resynchronization feature is comprised of the following three parts:
- The replication offset of the primary server (replication offset) and the replication offset from the server;
- replication Backlog buffer for the primary server (replication backlog);
- The running ID of the server (run ID).
Copy Offset
The two parties performing the replication-the primary and slave servers-maintain a replication offset, respectively:
- Each time the primary server propagates n bytes of data from the server, it adds the value of its own copy offset to n;
- Each time the server receives n bytes of data propagated by the primary server, the value of its own copy offset is added to N;
(Holy crap!!) Don't you have feedback from the server? What if I lose my bag? Does it use TCP? Everybody go on, I just want to be interspersed with some of my ideas.
By comparing the copy offset of the master-slave server, the program can easily know whether the master and slave servers are in a consistent state:
- If the master-slave server is in a consistent state, the offset between the master and slave servers is always the same;
- Conversely, if the offset between the master and slave servers is not the same, then the master and slave servers are not in a consistent state.
As in the following scenario:
Assuming that the primary server is reconnected immediately after server A is disconnected and succeeds, then the psync command is sent from the server to the primary server, reporting the current replication offset from server A to 10086, then Should the primary server perform full resynchronization or partial resynchronization from the server? If partial resynchronization is performed, how can the primary server compensate for the portion of data lost from Server A during disconnection? The answers to the above questions are related to the replication backlog buffer.
Replication Backlog Buffers
The replication backlog buffer is a fixed-length (fixed-size) FIFO queue maintained by the primary server with a default size of 1MB.
and normal FIFO queue the length of a fixed-length FIFO queue is fixed when the element is increased and decreased, and the first queued element is ejected when the number of queued elements is greater than the queue length, and the new element is placed in the queue.
When the primary server makes command propagation, it will not only send the write command to all slave servers, but also queue the write command to the replication backlog buffer.
As a result, a portion of the most recently propagated write command is kept inside the replication backlog buffer for the primary server, and the replication backlog records the corresponding copy offsets for each byte in the queue, as shown in the following table.
When the primary server is re-attached from the server, the slave server sends its own replication offset of offset to the primary server via the Psync command, which determines what synchronization to take from the server, based on this replication offset:
- If the data after offset offsets (that is, the data starting at offset offset+1) still exists in the replication backlog buffer, the master server performs a partial resynchronization operation from the server;
- Conversely, if the data after offset offsets does not already exist in the replication backlog, the master server performs a full resynchronization operation on the slave server.
Adjust the size of the replication backlog buffer as needed
The default size of Redis for replication backlog buffers is 1MB, which may not be appropriate if the primary server needs to perform a large number of write commands, or if it takes longer to reconnect after the master/slave server is disconnected. The replication resynchronization mode of the Psync command does not work properly if the size of the replication backlog buffer is not set appropriately, so it is important to correctly estimate and set the size of the replication backlog buffer.
The minimum size of the replication backlog buffer can be estimated according to the formula Second*write_size_per_second:
- Where second is the average time (in seconds) required to reconnect to the primary server after disconnection from the server;
- The Write_size_per_second is the average amount of write command data generated by the primary server per second (sum of the length of the write command in the protocol format);
For example, if the primary server produces an average of 1 MB of write data per second, while the average of 5 seconds after the server is disconnected to reconnect to the primary server, the replication backlog buffer size cannot be less than 5MB.
For security reasons, you can set the duplicate backlog buffer size to 2*second*write_size_per_second, which ensures that most of the disconnection situations can be handled with partial resynchronization.
As for how to modify the size of the replication backlog buffer, you can refer to the description of the repl-backlog-size option in the configuration file.
Server Run ID
In addition to replication offsets and replication backlog buffers, the server run ID (run ID) is also required to implement partial resynchronization:
- Each Redis server, regardless of the primary server or the service, will have its own run ID;
- The run ID is automatically generated when the server is started and consists of 40 random hexadecimal characters, such as 53B9B28DF8042FDC9AB5E3FCBBBABFF1D5DCE2B3;
When the primary server is first copied from the server, the primary server sends its own run ID to the slave server, and the run ID is saved from the server.
When you disconnect from the server and reconnect to the previous primary server, the from server sends the previously saved run ID to the current connected primary server:
- If the run ID saved from the server is the same as the running ID of the current connection's primary server, then the primary server that is currently connected is copied before the server is disconnected, and the master server can continue to try to perform some resynchronization operations;
- Conversely, if the run ID saved from the server and the current connection's primary server are not the same, then the primary server that was copied before the server was disconnected is not the primary server that is currently connected, and the master server performs a full resynchronization operation from the server.
Implementation of the Psync command
There are two ways to invoke the Psync command:
- If no primary server has been previously replicated from the server, or if the slaveof no one command was previously executed, the Psync will be sent to the primary server at the beginning of a new replication from the server? -1 command, actively request the master server for full resynchronization (because it is not possible to perform partial resynchronization);
- Conversely, if a primary server has been replicated from the server, the Psync <runid> <offset> command will be sent to the primary server when a new replication is initiated from the server: where Runid is the run ID of the last replicated primary server, Instead, offset is the current replication offset from the server, and the primary server that receives the command uses these two parameters to determine which synchronization operation should be performed from the server.
Depending on the situation, the primary server that receives the Psync command returns one of the following three types of replies to the slave server:
- If the primary server returns +fullresync <runid> <offset> reply, then the primary server will perform a full resynchronization operation from the server: where Runid is the running ID of the primary server, the ID is saved from the server. Used the next time the Psync command is sent, and offset is the current replication offset of the primary server, which is used by the server as its own initialization offset;
- If the primary server returns a +continue reply, then the primary server will perform a partial resynchronization operation from the server, as long as the server waits for the primary server to send its missing part of the data.
- If the primary server returns a-err reply, it indicates that the primary server version is lower than Redis 2.8, it does not recognize the Psync command, and the server sends the sync command to the primary server and performs a full synchronization with the primary server.
Redis Learning notes-master-Slave Synchronization (replication)