Reference site: http://blog.nosqlfan.com/html/4077.html
This article comes from the gentle one knife sharing, introduced he in the actual work encounters some redis problems as well as the corresponding evasion and the solution, if you also use Redis, then may have some experience for reference.
Original link: http://zhupan.iteye.com/blog/1576108 1.Master Write memory snapshot
The Save command dispatches the Rdbsave function, blocking the work of the main thread, which has a very large performance impact when the snapshot is large and pauses the service, so master should not write a memory snapshot. 2.Master aof Persistence
If you do not rewrite the aof file, this persistence has the least impact on performance, but the aof file is growing, and the aof file will affect the recovery rate of master restart. 3.Master Call Bgrewriteaof
Master calls Bgrewriteaof to rewrite the aof file, aof in the rewriting of a large number of CPU and memory resources, resulting in excessive service load, a temporary service paused phenomenon.
Here is a case of my actual project, which is probably the case: a master,4 slave, no sharding mechanism, only read and write separation, Master is responsible for write operations and AOF log backup, aof file probably 5g,slave responsible for read operations, When Master calls Bgrewriteaof, the master and slave load will suddenly surge, Master's write request basically does not respond, lasted about 5 minutes, slave read request is also half unable to respond in time, The server load diagram for master and slave is as follows:
Master Server Load:
Slave Server load:
The above situation would not and should not have happened, because the previous master of this machine is slave, there is a shell scheduled task at 10 o'clock in the morning every day to call bgrewriteaof rewrite AoF file, and later because Master machine down, The backup of this slave cut into master, but the timing task forgot to delete, leading to the above tragic situation, the reason is still looking for a few days to find.
Setting the No-appendfsync-on-rewrite configuration to Yes can alleviate this problem, set to Yes to indicate that the rewrite period is not fsync to the new write operation, is temporarily in memory, and then writes after rewrite completes. It is best to not turn on Master's aof backup feature. 4.Redis Master-slave replication performance issues
The first slave to master synchronization is: Slave send a sync request to master, Master first dump the Rdb file, and then the Rdb file to the full amount of slave, and then master to the cached command to slave, the first synchronization completed. The second and subsequent synchronization implementations are: Master sends the snapshots of the variables directly to each slave in turn, in real time. The above process is repeated for whatever reason slave and master disconnects. Redis Master-slave replication is based on the memory snapshot of the persistence of the basis, as long as there are slave memory snapshots will occur. Although Redis claims that master-slave replication is not blocked, because the Redis uses single-threaded services, if the master snapshot file is large, then the first full volume transmission will take a long time, and the file transfer process Master may not be able to provide services, that is, services will be interrupted, for key services, This consequence is also very terrible.
The above 1.2.3.4 Root cause of the problem is inseparable from the system IO bottleneck problem, that is, hard disk read and write speed is not fast enough, the main process Fsync ()/write () operation is blocked. 5. Single point of failure
As a result of the current redis of master and slave replication is not mature enough, so there are obvious single point of failure, this can only do their own solution, such as: Active replication, proxy to achieve the replacement of Master slave, this is also the Redis author of the current priority of one of the tasks, The author's solution is simple and elegant, the details can be seen Redis Sentinel design draft http://redis.io/topics/sentinel-spec. Summary Master should not do any persistent work, including memory snapshots and aof log files, especially if you do not enable memory snapshots for persistence. If the data is critical, a slave opens the AOF backup data, and the policy is synchronized once per second. In order to master the speed of replication and the stability of the connection, slave and master are best in the same LAN. As far as possible to avoid in the larger pressure of the main library to increase from the library for master's stability, master-slave replication do not use the graph structure, with one-way linked list structure more stable, that is, the principal-subordinate relationship: Master<–slave1<–slave2<–slave3 ... Such a structure is also convenient to solve a single point of failure, to achieve the replacement of Master Slave, that is, if master hung, you can immediately enable Slave1 to do master, the other unchanged.