I. Overview
The master-slave replication policy of Redis is implemented by its persisted RDB file, which is the process of dump the Rdb file, transfer the Rdb file to slave, and then synchronize the operation of the dump to slave in real time. Make the slave server (slave server) an exact replica of the master server.
Here are a few important aspects of the Redis replication feature:
(1) Redis uses asynchronous replication. Starting with Redis 2.8, the processing progress of the replication stream (replication stream) is reported to the primary server at the frequency of every second from the server.
(2) A master server can have multiple slave servers.
(3) Not only the primary server can have from the server, from the server can also have its own from the server, multiple from the server can form a graph structure.
(4) Replication does not block the primary server: even if one or more of the initial synchronizations are in progress from the server, the primary server can continue to process command requests.
(5) Replication does not block from the server: As long as the redis.conf
corresponding settings in the file, even if the initial synchronization from the server, the server can also use the old version of the dataset to handle the command query. However, the connection request is blocked during the time that the old version dataset was deleted from the server and loaded into the new version dataset. You can also configure the slave server to send an error to the client when the connection to the primary server disconnects.
(6) The primary server can be protected from persistent operations by replicating: Simply turn off the persistence of the primary server and then perform the persistence operation from the server.
Second, the principle
Redis's master-slave replication is divided into two phases:
(1) Synchronous operation: Updates the database state from the server to the current database state of the primary server.
(2) Command propagation: When the database state of the primary server is modified, which causes the database state of the master-slave server to be inconsistent, the master-slave server returns to a consistent state.
Whether it is a first-time connection or a reconnection, a SYNC command is sent from the server to the primary server when a slave server is established.
1. Synchronization
The synchronization operation from the server to the primary server needs to be done by sending the sync command to the primary server, following the steps of the Sync command:
(1) Send the SYNC command from the server to the primary server.
(2) The master server receiving the SYNC command executes the BGSAVE command, generates an RDB file in the background, and uses a buffer to record all the write commands that are executed from now on.
(3) When the BGSAVE command of the primary server finishes, the master server sends the RDB file generated by the BGSAVE command to the slave server, receives and loads the Rdb file from the server, and updates its own database state to the database state when the primary server executes the BGSAVE command.
(4) The primary server sends all the write commands logged in the buffer to the slave server, executes these write commands from the server, and updates its own database state to the current state of the primary server database.
Shows the communication process between the master and slave servers during the SYNC command execution:
2. Command propagation
After the synchronization operation is completed, the database of the master and slave servers is in a consistent state, but this consistency is not immutable. When the primary server executes a write command sent by the client, the primary server's database is likely to be modified, causing the master and slave server state to be no longer consistent. In order for the master and slave servers to return to the same state again, the primary server is required to perform command propagation operations from the server: The primary server writes its own write command-that is, the write command that caused the master-slave server inconsistency to be executed from the server, after the same write command was executed from the server. The master-slave server will return to a consistent state again. However, this replication function is defective: After the master-slave server disconnection after the synchronization action, generate a complete RDB file and sent to the server load, but the master and slave server's database state is basically consistent before the disconnection, the inconsistent part of the first server to execute that part of the command to modify the database, So the sync command is wasteful, because when an RDB file is generated, a process that consumes CPU, memory, and IO resources, sending an RDB file to a slave server consumes a lot of network bandwidth resources, blocking the server from loading the Rdb file and not responding to any commands. So it is not necessary and very unreasonable to perform sync commands in most cases.
In order to solve the performance problem of the version Sync command prior to 2.8, version 2.8 has designed a new command Psync,psync command into full resynchronization and partial resynchronization , The full resynchronization process is used to initialize the initial replication from the server and the Sync command is basically the same, Psync is used for re-copying after disconnection, if the condition allows, it does not generate an RDB file, but to reply from the server a +continue to perform partial resynchronization, And the commands from the server to modify the database are sent to the slave server to execute these commands from the server to synchronize the database.
The partial resynchronization feature is comprised of the following sections:
- Replication offsets from the primary server and replication offsets from the server : When the primary server synchronizes commands from the server, both the primary and slave servers record a copy offset, and the two replication offsets are the same when the master-slave server's database state is consistent. If these two offsets are inconsistent, the state of the current master-slave server does not match.
- replication Backlog buffer for the primary server : The replication backlog is a fixed-size FIFO queue that pops up the oldest inserted data when the queue is full, and puts the command into the buffer when the command propagates on the primary server, which contains two parts of data, offsets, and bytes. At the time of replication, the offset is escalated to the primary server from the server, and the primary service checks whether the current offset exists in the buffer, if there is a partial resynchronization, if there is no full resynchronization. Because this backlog is a fixed-size queue, when disconnected from the server for a long time, the replication offset from the server is probably no longer in the buffer, which can only be full resynchronization.
- running ID of the server: theprimary server will send the ID to the slave server, save the primary server ID from the server when the initial synchronization occurs, and when the disconnection is connected, it will escalate the primary server ID previously saved to the primary server, and the primary server checks whether the primary server ID copied from the server is the same as its own ID. If the same, partial resynchronization is performed, if different instructions are logged from the server before the state is not the current primary server, then a full resynchronization is required.
Psync Command Implementation
Initial copy or previous slaveof no one command, perform full resynchronization: Send Psync? -1 commands to the master server.
If a primary server has been replicated from the server, and the Psync <runid> <offset> command is sent to the primary server when a new replication is started, Runid is the last replicated primary server id,offset is the replication offset from the server. The master server determines which synchronization to make based on the two parameters, determines if the server ID is the same as the native, if the replication offset is in the buffer, and the master server has three replies:
- Reply +fullresync <runid> <offset> Perform a full resynchronization, from the server offset as the initial replication offsets
- Reply to +continue, which means performing a partial resynchronization, waiting for the primary server to send the missing data from the server
- Reply to-err, which indicates that the primary server version is less than 2.8 and does not support the Psync command
New version copy process:
- Set the primary server address and port by calling the saveof <master_ip> <master_port> command.
- Establish a socket connection.
- Send a ping command to check whether the master-slave server can handle the command properly.
- Authentication, the Masterauth is set from the server and the primary server is set up Requirepass is required for authentication. Both of these options are either set or not set, and an error occurs if you set only one command from the server to the primary server.
- Send port information by executing the command replconf Listening-port <port-number>, and sending the listener port number from the server to the primary server.
- Synchronize to send the Psync command from the server to the primary server.
- Command propagation, after the completion of synchronization, the master server will be executed after the write command to the slave server to ensure that the status of the same.
Heartbeat detection
At the command propagation stage, commands are sent to the primary server from the server's default frequency per second: replconf ACK <replication_offset>,replication_offset is the copy offset from the server, which has three functions:
- Detects the network connection status from the server, detects if the master-slave server connection is normal, and if the primary server does not receive the replconf ACK command from the server for a certain amount of time, then there may be a problem with their connection.
- Auxiliary implementation of the Min-slaves option, Min-slaves-to-write and min-slaves-max-lag two options to prevent the primary server from performing write commands in an unsafe situation, min-slaves-to-write 3 Min-slaves-max-lag 10 indicates that the primary server rejects the write command if it is less than 3 from the server, or if 3 latencies from the server are greater than 10 seconds.
- When the detection command is lost, the primary server receives the REPLCONF ACK command from the server and checks whether the offset from the server is consistent with the primary server, and if the inconsistency sends a command from the server offset from the backlog buffer to the slave server.
Iii. others 1. When you turn off the primary server persistence, the data security of the replication feature
When configuring the Redis replication feature, it is highly recommended to turn on the persistence feature of the primary server. Otherwise, due to delays and other issues, the deployed service should avoid automatic pull-up.
To help understand the risk of automatic pull-out when the primary server shuts down, refer to the following example that will cause the master-slave server to lose all data:
1. Assume that node A is the primary server and that persistence is turned off. and Node B and node C replicate data from Node A
2. Node A crashes, and then the auto pull service restarts Node A. Since the persistence of Node A is turned off, there is no data after the reboot
3. Node B and node C will copy the data from Node A, but the data for a is empty, so the copy of the data it saves is deleted.
Even using Sentinel to achieve high availability of redis is dangerous when you turn off persistence on the primary server and turn on the auto-pull process at the same time. Because the primary server may pull up so fast that Sentinel does not detect that the primary server has been rebooted during the configured heartbeat interval, and then performs the above data loss process.
Data security is extremely important at any time, so you should prevent the primary server from shutting down and automatically pulling up while persisting.
2. Read-only from the server
Starting with Redis 2.6, read-only mode is supported from the server, and this mode is the default mode from the server.
Read-only mode is redis.conf
controlled by the options in the file slave-read-only
, or it can be turned on or off via the Config set command.
Read-only from the server will refuse to execute any write commands, so there will be no case of accidentally writing data to the slave server because of an operation error.
Even if the server is read-only, DEBUG
and CONFIG
the managed commands are still available, we should not expose the server to the Internet or any non-trusted network. However, with redis.conf
the command renaming option in, we can increase the security of read-only from the server by prohibiting the execution of certain commands.
You might be curious, since the write data from the server will be overwritten by the resynchronization data, or it may be lost when restarting from the server, so why make one from the server writable.
The reason is that some of the temporary data that is not important can still be saved on top of the server. For example, the client can save the accessibility (reachability) information of the primary server from the server to implement a failover (failover) policy.
3. From the service-related configuration
If the master server has requirepass
a password set through the option, we must also make the appropriate authentication settings for the slave server in order for the synchronization operation to proceed smoothly.
For a running server, you can use the client to enter the following command:
To permanently set this password, you can add it to the configuration file:
4. The primary server performs a write operation only if there are at least N slave servers
Starting with Redis 2.8, to ensure data security, you can configure the primary server to execute the write command only if there are at least n currently connected from the server.
However, because Redis uses asynchronous replication, the write data sent by the primary server is not necessarily received from the server, so the likelihood of data loss is still there.
Here's how this feature works:
- Pings the primary server once per second from the server and reports on the processing of the replication stream.
- The primary server logs the last time that each ping was sent to it from the server.
- The user can configure, specify the maximum network latency
min-slaves-max-lag
, and the minimum number of slave servers required to perform the write operation min-slaves-to-write
.
If there is at least min-slaves-to-write
one slave server and these servers have a latency value of less than a min-slaves-max-lag
second, the primary server performs a write operation to the client request.
You can consider this feature as a conditional relaxed version of C in the CAP theory: Although the persistence of the write operation is not guaranteed, at least the window that loses the data is strictly limited to the specified number of seconds.
On the other hand, if the condition does not reach min-slaves-to-write
and the min-slaves-max-lag
specified condition, then the write operation will not be executed, and the primary server will return an error to the client requesting the write operation.
Here are two options for this feature and the parameters they require:
min-slaves-to-write <number of slaves>
min-slaves-max-lag <number of seconds>
5. Configuration
Reference:
Http://redisdoc.com/topic/replication.html
Redis Basic Learning (v) master-slave replication of-redis