1.Master write memory snapshots, save command dispatch Rdbsave function, blocking the main thread of the work, when the snapshot is relatively large performance impact is very large, intermittent pause service, so master best not write memory snapshots.
2.Master aof Persistence, this persistence method has the least effect on performance if AOF files are not rewritten, but aof files are growing, aof files have been affected by the General Assembly's recovery rate of Master restart.
3.Master call bgrewriteaof rewrite aof file, aof in the rewrite will take up a lot of CPU and memory resources, resulting in high service load, a temporary service paused phenomenon.
Here is a case of my actual project, which is probably the case: a master,4 Slave, no sharding mechanism, only read and write separation, Master is responsible for write operations and AOF log backup, aof file probably 5g,slave responsible for read operations, When Master calls Bgrewriteaof, the master and slave load will suddenly surge, Master's write request basically does not respond, lasted about 5 minutes, slave read request is also half unable to respond in time, The server load diagram for master and slave is as follows:
Master Server Load:
Slave Server load:
The above situation would not and should not have happened, because the previous master of this machine is slave, there is a shell scheduled task at 10 o'clock in the morning every day to call bgrewriteaof rewrite AoF file, and later because Master machine down, The backup of this slave cut into master, but the timing task forgot to delete, leading to the above tragic situation, the reason is still looking for a few days to find.
Setting the No-appendfsync-on-rewrite configuration to Yes can alleviate this problem, set to Yes to indicate that the rewrite period is not fsync to the new write operation, is temporarily in memory, and then writes after rewrite completes. It is best to not turn on Master's aof backup feature.
4.Redis Master-slave replication performance problem, the first slave to master synchronization is: Slave to master the synchronization request, Master first dump Rdb file, and then Rdb file full amount of transmission to slave, Master then transfers the cached commands to the slave for the first synchronization to complete. The second and subsequent synchronization implementations are: Master sends the snapshots of the variables directly to each slave in turn, in real time. The above process is repeated for whatever reason slave and master disconnects. Redis Master-slave replication is based on the memory snapshot of the persistence of the basis, as long as there are slave memory snapshots will occur. Although Redis claims that master-slave replication is not blocked, because the Redis uses single-threaded services, if the master snapshot file is large, then the first full volume transmission will take a long time, and the file transfer process Master may not be able to provide services, that is, services will be interrupted, for key services, This consequence is also very terrible.
The above 1.2.3.4 Root cause of the problem is inseparable from the system IO bottleneck problem, that is, hard disk read and write speed is not fast enough, the main process Fsync ()/write () operation is blocked.
5. Single point of failure, due to the current redis of the master-slave replication is not mature enough, so there are obvious single point of failure, this can only do their own solution, such as: Active replication, proxy to achieve the replacement of Master slave, this is also the Redis author of the current priority of the task , the author's solution is simple and elegant, details can be seen Redis Sentinel design draft http://redis.io/topics/sentinel-spec.
Summarize:
1.Master It is best not to do any persistent work, including memory snapshots and aof log files, especially do not enable memory snapshots to be persisted.
2. If the data is critical, a slave opens the AOF backup data, and the policy is synchronized once per second.
3. In order to master the speed of replication and the stability of the connection, slave and master are best in the same LAN.
4. Try to avoid increasing the number of libraries on the larger pressure main Library
5. In order to master the stability, master-slave replication do not use a graph structure, with one-way linked list structure more stable, that is, the principal-subordinate relationship: master<--slave1<--slave2<--Slave3 ..., Such a structure is also convenient to solve a single point of failure, to achieve the replacement of Master Slave, that is, if master hung, you can immediately enable Slave1 to do master, the other unchanged.