Highly available Redis (eight): Redis master-slave replication

Source: Internet
Author: User
Tags failover redis cluster redis server


1.Redis replication principle and Optimization 1.1 Redis standalone problem 1.1.1 machine failure


The deployment of a Redis node on a server, if the machine has a motherboard damage, hard disk damage and other problems, can not be completed in a short time repair, you can not process Redis operations, this is a possible problem of single machine



Similarly, the server is functioning properly, but the Redis master process has a down-time event where only a Redis restart is required. If you do not consider performance loss during a Redis restart, consider a single-machine deployment of Redis



When a Redis standalone deployment fails, migrating Redis to another server requires a high cost of synchronizing data from the failed Redis to the newly deployed Redis node


1.1.2 Capacity Bottleneck


One server has 16G of memory, at which time allocate 12G memory to run Redis



If there is a new requirement: Redis needs to occupy more than 32G or 64G of memory, this server will not meet the requirements, you can consider replacing a larger memory server, you can also use multiple servers to form a Redis cluster to meet this demand


1.1.3 QPS Bottleneck


According to Redis officials, a single redis can support a 100,000 QPS, and if the business now requires 1 million QPS, consider using Redis distributed


2. What is a master-slave copy of 2.1 one from the model


A Redis node is the master node (master node) and is responsible for providing services externally.



The other node is the slave node (slave node), which is responsible for synchronizing the data of the master node to achieve the backup effect. The slave node can also provide services when the primary node fails.



As shown





2.2 A master multi-slave model


A Redis node is the master node (master node) and is responsible for providing services externally.



Multiple nodes are slave nodes (from nodes). Each slave backs up the data in the master node for a higher-availability effect. In this case, even if both master and a Slave fail at the same time, the remaining slave can still read the service and ensure that the data is not lost.



When Master has a lot of read and write, reached the limit threshold of redis, can use multiple slave nodes to split the read operation of Redis, effectively realize the flow of the shunt and load balance, so a master more from can also do read and write separation





2.3 Read-Write separation model


The master node is responsible for writing the data, while the client can read the data from the slave node





3. master-Slave copy function


Provides multiple backups of data that can significantly improve the read performance of Redis and is the basis for redis high availability or distribution


4. Master-Slave replication Configuration 4.1 slaveof command





Cancel replication





4.2 Configuration file Configuration


Modifying a Redis configuration file/etc/redis.conf


Slaveof <masterip> <masterport> # masterip is the primary node IP address, masterport is the primary node port
Slave-read-only yes # The slave node only performs read operations and does not write operations to ensure that the master and slave devices have the same data.
4.3 Comparison of two master-slave configuration methods
You can use the command line to configure Redis. You can use the configuration file to maintain the configuration. The configuration is unchanged.
4.4 Examples


There are two virtual machines, and the operating system isCentOS 7.5


The IP address of a virtual machine is 192.168.81.100, do the master
The IP address of a virtual machine is 192.168.81.101, and it is done as slave.
Step one: Operate on 192.168.81.101 virtual machines
[[email protected] ~]# vi /etc/redis.conf # Modify Redis configuration file
     Bind 0.0.0.0 # can connect to the Redis server from the outside
     Slaveof 192.168.81.100 6379 # Set the IP address and port of the master


Then save the changes and start Redis


[[email protected] ~]# systemctl stop firewalld # Close the firewalld firewall
[[email protected] ~]# systemctl start redis # Start the Redis server on the slave
[[email protected] ~]# ps aux | grep redis-server # View redis-server process
Redis 2319 0.3 0.8 155204 18104 ? Ssl 09:55 0:00 /usr/bin/redis-server 0.0.0.0:6379
Root 2335 0.0 0.0 112664 968 pts/2 R+ 09:56 0:00 grep --color=auto redis
[[email protected] ~]# redis-cli # Start Redis client
127.0.0.1:6379> info replication # View Redis replication information View Redis info on 192.168.81.101 machine
# Replication
Role:slave # role is slave
Master_host: 192.168.81.100 # The primary node IP is 192.168.81.100
Master_port: 6379 # The master port is 6379
Master_link_status:up
Master_last_io_seconds_ago: 5
Master_sync_in_progress:0
Slave_repl_offset: 155
Slave_priority:100
Slave_read_only: 1
Connected_slaves:0
Master_repl_offset:0
Repl_backlog_active:0
Repl_backlog_size: 1048576
Repl_backlog_first_byte_offset:0
Repl_backlog_histlen:0
Step two: Operate on the 192.168.81.100 virtual machine
[[email protected] ~]# systemctl stop firewalld # Close the firewalld firewall
[[email protected] ~]# vi /etc/redis.conf # Modify Redis configuration file

    Bind 0.0.0.0
Then save the changes and start Redis
[[email protected] ~]# systemctl start redis #Start Redis on master
[[email protected] ~]# ps aux | grep redis-server # View redis-server process
Redis 2529 0.2 1.8 155192 18192 ? Ssl 17:55 0:00 /usr/bin/redis-server 0.0.0.0:6379
Root 2536 0.0 0.0 112648 960 pts/2 R+ 17:56 0:00 grep --color=auto redis

[[email protected] ~]# redis-cli # Start the redis-cli client on the master
127.0.0.1:6379> info replication # View Redis information on the 192.168.81.100 machine
# Replication
Role:master # role as the master node
Connected_slaves:1 # Connect a slave node
Slave0:ip=192.168.81.101,port=6379,state=online,offset=141,lag=2 # slave node information
Master_repl_offset: 141
Repl_backlog_active:1
Repl_backlog_size: 1048576
Repl_backlog_first_byte_offset: 2
Repl_backlog_histlen: 140
127.0.0.1:6379> set hello world # Write data to the master node
OK
127.0.0.1:6379> info server
# Server
Redis_version: 3.2.10
Redis_git_sha1:00000000
Redis_git_dirty:0
Redis_build_id:c8b45a0ec7dc67c6
Redis_mode:standalone
Os:Linux 3.10.0-514.el7.x86_64 x86_64
Arch_bits: 64
Multiplexing_api:epoll
Gcc_version: 4.8.5
Process_id: 2529
Run_id:7091f874c7c3eeadae873d3e6704e67637d8772b # Note this run_id
Tcp_port: 6379
Uptime_in_seconds: 488
Uptime_in_days:0
Hz:10
Lru_clock: 12784741
Executable: /usr/bin/redis-server
Config_file: /etc/redis.conf
Step three: Go back to 192.168.81.101, which operates from a node
127.0.0.1:6379> get hello # Get the value of ‘hello‘, you can get
"world"
127.0.0.1:6379> set a b # Write data to the node from 192.168.81.101, failed
(error) READONLY You can't write against a read only slave.
127.0.0.1:6379> slaveof no one # Cancel from node settings
OK
127.0.0.1:6379> info replication # View the 192.168.81.101 machine, it is no longer a slave node, but becomes a master node.
# Replication
Role:master # becomes the master node
Connected_slaves:0
Master_repl_offset: 787
Repl_backlog_active:0
Repl_backlog_size: 1048576
Repl_backlog_first_byte_offset:0
Repl_backlog_histlen:0
127.0.0.1:6379> dbsize # View all data sizes of Redis on 192.168.81.101
(integer) 2
Fourth step: Back to 192.168.81.100 virtual machine
127.0.0.1:6379> mset a b c d e f # Write data to the Redis collection on 192.168.81.100
OK
127.0.0.1:6379> dbsize # Redis data size is 5
(integer) 5
Fifth step: View the logs for Redis on 192.168.81.100 virtual machines
[[email protected] ~]# tail /var/log/redis/redis.log # View the last 10 lines of Redis logs
2529: M 14 Oct 17:55:09.448 * DB loaded from disk: 0.026 seconds
2529: M 14 Oct 17:55:09.448 * The server is now ready to accept connections on port 6379
2529: M 14 Oct 17:55:10.118 * Slave 192.168.81.101:6379 asks for synchronization
2529:M 14 Oct 17:55:10.118 *Partial resynchronization not accepted: Runid mismatch (Client asked for runid ‘9f93f85bce758b9c48e72d96a182a2966940cf52', my runid is ‘7091f874c7c3eeadae873d3e6704e67637d8772b’) # Same as the run_id viewed by the info command on the 192.168.81.100 device
2529:M 14 Oct 17:55:10.118 * Starting BGSAVE for SYNC with target: disk # Successful execution of the BGSAVE command
2529: M 14 Oct 17:55:10.119 * Background saving started by pid 2532
2532:C 14 Oct 17:55:10.158 * DB saved on disk
2532:C 14 Oct 17:55:10.159 * RDB: 12 MB of memory used by copy-on-write
2529: M 14 Oct 17:55:10.254 * Background saving terminated with success
2529:M 14 Oct 17:55:10.256 * Synchronization with slave 192.168.81.101:6379 succeeded # Synchronize data to 192.168.81.101
Sixth step: Back to 192.168.81.101 virtual machine
127.0.0.1:6379> slaveof 192.168.81.100 6379 # Reset 192.168.81.101 to the slave node of 192.168.81.100
OK
127.0.0.1:6379> dbsize
(integer) 5
127.0.0.1:6379> mget a
1) "b"
Seventh step: View the logs for Redis on 192.168.81.101 virtual machines
[[email protected] ~]# tail /var/log/redis/redis.log # View the last 10 lines of Redis logs
2319:S 14 Oct 09:55:17.625 * MASTER <-> SLAVE sync started
2319:S 14 Oct 09:55:17.625 * Non blocking connect for SYNC fired the event.
2319:S 14 Oct 09:55:17.626 * Master replied to PING, replication can continue...
2319:S 14 Oct 09:55:17.626 * Trying a partial resynchronization (request 9f93f85bce758b9c48e72d96a182a2966940cf52:16).
2319:S 14 Oct 09:55:17.628 * Full resync from master: 7091f874c7c3eeadae873d3e6704e67637d8772b:1 # Full copy data from the master node
2319:S 14 Oct 09:55:17.629 * Discarding previously cached master state.
2319:S 14 Oct 09:55:17.763 * MASTER <-> SLAVE sync: receiving 366035 bytes from master # Display the size of the data synchronized from the master
2319:S 14 Oct 09:55:17.765 * MASTER <-> SLAVE sync: Flushing old data # slave clears the original data
2319:S 14 Oct 09:55:17.779 * MASTER <-> SLAVE sync: Loading DB in memory # Load the synchronized RDB file
2319:S 14 Oct 09:55:17.804 * MASTER <-> SLAVE sync: Finished with success
5. Full-volume replication and partial replication 5.1 full-volume replication 5.1.1 run_id concept


Each time Redis starts, it has a random ID to identify the Redis, which is the random ID of the run_id that was viewed by the info command above.



View run_id and offsets on a 192.168.81.101 virtual machine


[[email protected] ~]# redis-cli info server |grep run_id
run_id:7e366f6029d3525177392e98604ceb5195980518
[[email protected] ~]# redis-cli info |grep master_repl_offset
master_repl_offset:0


View run_id and offsets on a 192.168.91.100 virtual machine


[[email protected] ~]# redis-cli info server | grep run_id
run_id:7091f874c7c3eeadae873d3e6704e67637d8772b
[[email protected] ~]# redis-cli info | grep master_repl_offset
master_repl_offset:4483


RUN_ID is a very important identity.



In the above example, 192.168.81.101 as slave to replicate the data on the master 192.168.81.100, will get the corresponding run_id on the 192.168.81.100 machine to make an identity on the 192.168.81.101



When the run_id of Redis on the 192.168.81.100 machine changes, means that the 192.168.81.100 machine on a redis restart operation or other significant changes, 192.168.81.101 will be 192.168.81.100 on the data all synchronized to 192.168.81.101, this is the concept of full-scale replication


The concept of 5.1.2 offset


Offset is the number of bytes of data written.



When writing data to Redis on 192.168.81.100, master records how much data was written and recorded in the offset.



The operation on the 192.168.81.100 is synchronized to the 192.168.81.101 machine, and Redis on the 192.168.81.101 records the offset.



When the offset on both machines is the same, the data synchronization is done



Offset is a very important basis for partial replication



To view the offset of Redis on a 192.168.81.100 machine


127.0.0.1:6379> info replication # View replication information
# Replication
Role:master
Connected_slaves:1
Slave0: ip=192.168.81.101, port=6379, state=online, offset=8602, lag=0
Master_repl_offset:8602 # The offset on 192.168.81.100 is now 8602.
Repl_backlog_active:1
Repl_backlog_size: 1048576
Repl_backlog_first_byte_offset: 2
Repl_backlog_histlen:8601
127.0.0.1:6379> set k1 v1 # Write data to 192.168.81.100
OK
127.0.0.1:6379> set k2 v2 # Write data to 192.168.81.100
OK
127.0.0.1:6379> set k3 v3 # Write data to 192.168.81.100
OK
127.0.0.1:6379> info replication # View replication information
# Replication
Role:master
Connected_slaves:1
Slave0: ip=192.168.81.101, port=6379, state=online, offset=8759, lag=1
Master_repl_offset:8759 # The offset on 192.168.81.100 after writing the data is 8759
Repl_backlog_active:1
Repl_backlog_size: 1048576
Repl_backlog_first_byte_offset: 2
Repl_backlog_histlen: 8758


To view the offset of Redis on a 192.168.81.101 machine


127.0.0.1:6379> info replication # View replication information
# Replication
Role:slave
Master_host: 192.168.81.100
Master_port: 6379
Master_link_status:up
Master_last_io_seconds_ago: 8
Master_sync_in_progress:0
Slave_repl_offset: 8602
Slave_priority:100
Slave_read_only: 1
Connected_slaves:0
Master_repl_offset:0 # The offset on 192.168.81.101 is now 8602.
Repl_backlog_active:0
Repl_backlog_size: 1048576
Repl_backlog_first_byte_offset:0
Repl_backlog_histlen:0
127.0.0.1:6379> get k1
"v1"
127.0.0.1:6379> get k2
"v2"
127.0.0.1:6379> get k3
"v3"
127.0.0.1:6379> info replication # View replication information
# Replication
Role:slave
Master_host: 192.168.81.100
Master_port: 6379
Master_link_status:up
Master_last_io_seconds_ago: 7
Master_sync_in_progress:0
Slave_repl_offset:8759 # The offset on 192.168.81.101 after synchronizing data is 8759
Slave_priority:100
Slave_read_only: 1
Connected_slaves:0
Master_repl_offset:0
Repl_backlog_active:0
Repl_backlog_size: 1048576
Repl_backlog_first_byte_offset:0
Repl_backlog_histlen:0

If the offset gap on the master-slave node is too large, it indicates that the data is not synchronized from the primary node, and the connection between the master and slave nodes is problematic: such as network, block, buffer, etc.

5.1.3 concept of full-volume replication


If a large number of data has been written on a master node, not only does the node synchronize the data that already exists, but also synchronizes the data that is written to master on slave during synchronization (if Master is written to the data during synchronization) to achieve full data synchronization, which is the full-volume replication capability of Redis



The Master of Redis synchronizes the current RDB file to slave, during which the data written by Master is written, and复制缓冲区(repl_back_buffer)when the Rdb file is synchronized to slave, themaster通过偏移量的对比复制缓冲区(repl_back_buffer)data in is synchronized to slave



Redis usespsynccommands for full-volume and partial replication of data



The Psync command has two parameters: run_id and Offset



Steps to psync the command:


1. When the slave synchronizes data to the master for the first time, it does not know the run_id and offset of the master, and uses the `psync ? -1` command to initiate a synchronization request to the master.
2.master accepts the request, knows that the slave is doing full copy, the master will respond to the run_id and offset to the slave
3.slave save the run_id and offset sent by the master
4. After the master responds to the slave, execute the BGSAVE command to generate the RDB file for all current data, and then synchronize the RDB file to the slave.
5. The repl_back_buffer copy buffer in Redis can record the data written after the RDB file is generated until the synchronization is completed, and then synchronize the data to the slave.
6.slave executes the flushall command to clear the original data in the slave, and then reads all the data from the RDB file to ensure the synchronization of the data between the slave and the master.


As shown in the following:





5.1.4 Full-volume replication overhead
    • The overhead of full-volume replication is very high
    • Master executing the Bgsave command consumes a certain amount of time
    • The Bgsave command will fork the sub-process and have a consumption of CPU, memory and hard disk
    • The time that master transfers the Rdb file to slave, which also consumes a certain amount of network bandwidth during transmission
    • Slave the time to clear the original data, if the slave there is more data, empty the original data will also consume a certain amount of time
    • Slave loading an RDB file consumes a certain amount of time
    • Possible aof file rewrite time: The Rdb file is loaded, and if the AOF function of the slave node is turned on, AoF overrides are performed to ensure that the latest data is kept in the aof file

      5.4 Full-volume replication issues


In addition to the overhead mentioned above, if there is a problem with the network between master and slave, data that is synchronized on slave will be lost over time



The best way to solve this problem is to do a full copy again, synchronizing all the data in master



Partial replication has been added to the Redis 2.8 release, and if there is a problem with the network between master and slave, use partial replication to minimize the possibility of data loss, rather than copying all


5.2 Part Copy


When the connection between master and slave is broken, master writes the data and saves the written data to the Repl_back_buffer copy buffer



When the network between master and slave is connected, slave executes thepsync {offset} {run_id}command, offset is the amount of offsets on the slave node



Master receives an offset from the slave transmission, which is compared to offset in the repl_back_buffer copy buffer,
If the received offset is less than the offset recorded in Repl_back_buffer, Master sends the data between the two offsets to the Slave,slave synchronization complete, and the data in slave is consistent with the data in master



As shown





6. Master-slave replication failure 6.1 slave Downtime





When this architecture reads and writes apart, the slave of the outage cannot synchronize data from master


6.2 Master Outage





The master of Redis cannot provide services, only slave can provide data read service



Workaround: Take one of the slave as master to provide write data function, another slave as the new master from the node, provide read data function, this solution still needs to be done manually






Master-Slave mode does not implement automatic failover of failure, this is the role of the Redis Sentinel


7. Development operations Common problems 7.1 read and write separation

Read/write Separation: Master is responsible for writing the data and allocating the data to the slave node





Read and write separation can reduce the pressure of master, on the other hand, expand the ability to read data



Problems that can be encountered with read and write separations:


7.1.1 Replication Data Latency


In most cases, master synchronizes data to slave asynchronously, and there is a time difference in the process



When the slave encounters blocking, there is a certain delay in receiving the data, which may occur when data is read from slave during this time period



The offset value of master and slave can be monitored, and when the value of offset is too large, the read traffic can be converted to master, but this method has some cost


7.1.2 Read expired data


How Redis deletes outdated data


Mode one: Lazy strategy
When Redis operates this data, it will check to see if the data has expired. If the data has expired, it will return a -2 to the client, indicating that the query data has expired.
Way two:
Every other cycle, Redis collects a portion of the keys to see if they have expired.
If the expired key is very large or the sampling speed is slower than the key expiration speed, there will be many expired keys that have not been deleted.
At this point, the slave will synchronize all the data on the master including the expired key.
Since the slave does not have permission to delete data, at this time, based on the mode of read-write separation, the client will read some outdated data from the slave, that is, dirty data.
7.1.3 from node failure


In Figure 9, the Slave is down and the cost of migrating from the slave node to the master node is high



Before considering the use of read-write separation, the first thing to consider is to optimize the master node



Redis has high performance, can satisfy most scenarios, can optimize some memory configuration parameters or aof policies, or consider using Redis distributed


7.2 Inconsistent master-slave configuration


The first is: for example,maxmemoryinconsistencies: loss of data



If the memory allocated by the master node is 4G, while the slave node allocates only 2G of memory, the normal master-slave replication can be performed at this time



However, when the slave data from Master is greater than 2G, slave will not throw an exception, but it will trigger the slave node'smaxmemory-policypolicy to retire part of the synchronized data, at which point the data in slave is incomplete, resulting in the loss of data.



Another case of master-slave configuration inconsistency is that the master node is optimized for data structure, but does not have the same optimizations for slave, which causes the memory inconsistency between master and slave.


7.3 The overhead of avoiding full replication 7.3.1 Full-volume replication is very large


When a slave is configured for a master for the first time, there is no data in the slave, and the full amount of replication is unavoidable



Workaround: Do not set the master-slave nodemaxmemorytoo large, the transfer and loading of the Rdb file will be very fast, the cost is relatively small, but also can be a low level of user access to the full amount of replication


7.3.2-node run_id mismatch


When Master restarts, the run_id of master changes. Slave when synchronizing data, it is found that the run_id of the previously saved master does not match the current run_id.



Workaround:


Do a full copy, when the master fails, the slave converts to the master to provide data writes, or use Redis sentry and clusters

New methods are available in the Redis4.0 version: Failover can avoid full replication when Master's run_id changes

7.3.3 Insufficient Copy Buffer


The role of the copy buffer is to write the new command into the buffer



The copy buffer is actually a queue with a default size of 1MB, that is, the copy buffer can only hold data of 1MB size



If the slave is disconnected from the master network, Master will save the newly written data to the copy buffer


When the data written to the copy buffer is less than 1MB, you can do partial copying to avoid full copying. If the newly written data is larger than 1MB, you can only make a full copy.


Modify the options in the configuration file torel_backlog_size加大复制缓冲区的大小reduce the occurrence of full-volume replication


7.4 Avoiding replication storms


In the master-slave architecture, when the master node restarts, Master's run_id will change, and all slave nodes will have a master-slave copy.



Master generates an RDB file, and then all slave nodes synchronize the Rdb file, where the CPU, memory, and hard disk of the master node are expensive, which is the replication storm



Single master node replication storm Solution


Replace the replication topology





Single-Machine multi-deployment replication storm





All nodes on a server are masters. If this server system restarts, all slave nodes are fully copied from this server, which will cause a lot of pressure on the server. The master node distributes multiple machines and assigns the master to On different servers
A simple summary of the master-slave mode of Redis
A master can have multiple slaves, a slave, or a slave. A slave can only have one master. The data flow is one-way, only from the master to the slave.


Highly available Redis (eight): Redis master-slave replication


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.