Single-threaded you do not block, Redis delay problem analysis and response

Source: Internet
Author: User
Tags redis cluster

Single-threaded you do not block, Redis delay problem analysis and response

The Redis event loop is handled in one thread, and as a single-threaded process, it is important to ensure that event processing is short-term, so that subsequent tasks in the event loop are not blocked;
When the data volume of redis reaches a certain level (for example, 20G), the effect of blocking operation on performance is particularly serious;
Below we summarize the time-consuming scenarios and coping methods in Redis;

Long-time commands cause blocking keys, sort, and so on

The keys command is used to find all keys that conform to a given pattern pattern, with a time complexity of O (n) and N as the number of keys in the database. This command causes the read-write thread to block for a few seconds when the number in the database reaches tens of thousands.
Similar commands have sunion sort and other operations;
What if you must use keys, sort, and so on in your business needs?

Solution:

In the architecture design, there is a "shunt" a move, said that the processing of requests and slow processing of requests to separate to open, otherwise, slow impact to the fast, so fast also not up; This is evident in the design of Redis, the pure memory operation of Redis, epoll non-blocking IO event processing, These quickly put in a thread to do, and persist, aof rewrite, master-slave synchronous data These time-consuming operations on a single process to deal with, do not slow the impact of fast;
Similarly, since we need to use the keys for these time-consuming operations, then we stripped them out, such as a single Redis slave node, dedicated to keys, sort and other time-consuming operations, these queries are not usually online real-time business, query slow slow point, mainly to complete the task, And there is no impact on the time-consuming tasks on the line;

Smembers command

The Smembers command is used to get the complete set, with a time complexity of O (n) and N as the number in the set;
If a collection holds tens of thousands of levels of data, a single fetch can also cause the event processing thread to block for a long time;

Solution:
Unlike Sort,keys and other commands, smembers may be a very high-frequency command in a real-time online application scenario, where a diversion is not appropriate and we need to consider it more from a design level;
At design time, we can control the number of sets, the number of sets is generally kept within 500;
For example, the original use a key to store a year of records, the amount of data, we can use 12 keys to save 12 months of records, or 365 keys to save each day of the record, the size of the collection is controlled in an acceptable range;

If it is not easy to divide the collection into multiple subcollections and persist with a large set to store, then you can consider using Srandmember key [count] when fetching the collection, randomly returning the specified number in the collection, and of course, if you want to traverse all the elements in the collection, this command is not appropriate;

Save command

The Save command uses the event-handling thread to persist data, and when the volume of data is large, it causes the thread to block for a long time (on our production, 1 g of reids memory needs about 12s) and the entire Redis block;
Save blocked the thread of the event processing, we can not even use REDIS-CLI to view the current system state, resulting in "when the end of the save, how much is saved" such information is unknown;

Solution:
I did not think of the need to use the Save command of the scene, any time need to persist when the use of bgsave is a reasonable choice (of course, this command will also bring problems, later chat);

Blocking caused by fork

When Redis needs to perform time-consuming operations, a new process is created, such as data persistence Bgsave:
When the RDB persistence is enabled, when the persistence threshold is reached, Redis will fork a new process to persist, using the operating system's Copy-on-wirte write-time replication policy, and the child process to share the page with the parent process. If the parent process's page (4K per page) has been modified, the parent process creates a copy of the page itself without affecting the child process;
Fork the new process, although the shareable data content does not need to be copied, but will copy the previous Process Space memory page table, if the memory space is 40G (consider each page table entry consumes 8 bytes), then the page table size is 80M, this replication takes time, if the use of virtual machines, especially the Xen virtual server , the time will be longer;
In our server node test, 35G of data bgsave instantly block 200ms or more;

Similarly, the following operations have a process fork;

    • Master synchronizes data to slave for the first time: When the master node receives a SYN sync request from the slave node, a new process is generated, the memory data is dump to the file, and then synchronized to the slave node;
    • AOF log rewrite: Use aof persistence mode, do aof file rewrite operation will create a new process to do rewrite, (rewrite does not read the existing files, but directly using in-memory data written in the archive log);

Solution:
To cope with the impact of large memory page table replication, there are some available measures:

  1. Control the maximum amount of memory per Redis instance;
    The delay of fork can be controlled from the memory by not allowing the fork to bring too much restriction.
    Generally recommended not more than 20G, according to the performance of their own server to determine (the larger the memory, the longer the persistence, the longer the Copy page table, the blocking of the event loop is extended)
    Sina Weibo to the proposal is not more than 20G, and our virtual machine on the test, to ensure that the application of Burr is not obvious, may be under 10G;

  2. Using a large memory page, the default memory page uses 4KB, so that when using 40G of memory, the page table has 80M, and each memory page is expanded to 4M, the page table is only 80K, so the replication page table is almost non-blocking, but also improves the Fast page table buffer tlb (translation Lookaside buffer), but the large memory page also has a problem, when the copy on the write, as long as a page fast in any one element is modified, the page block will need to replicate a copy (cow mechanism is the granularity of the page), so that during the write-up during the copy, will consume more memory space;

  3. Use of physical machines;
    If there is a choice, the physical machine is certainly the best solution, more convenient than the above;
    Of course, there are many virtualization implementations, in addition to the Xen system, most of the modern hardware can quickly copy the page table;
    But the company's virtualization is generally set up on-line, not because of our individual server reasons and change, if the face of only Xen, can only think how to use it well;

  4. Eliminate the emergence of new processes, do not use persistence, do not provide queries on the main nodes, and implement the following scenarios:
    1) only use single machine, do not open persistent, do not hang slave node. This is the simplest, there will not be a new process, but such a scheme is only suitable for caching;
    How to do the high availability of this program?
    To be highly available, you can hang a message queue on the front end of the Redis, use pub-sub in the message queue to distribute it, and ensure that each write operation falls to at least 2 nodes; Because all nodes have the same data, only one node is required to persist, and this node does not provide queries;



    2) Master-slave: Persistent on the main node, the main node does not provide the query, the query is provided by the slave node, from the node does not provide persistence, so that all the fork time-consuming operations are on the main node, and the query request is provided by the slave node;
    The problem with this scheme is what to do when the main node is broken.
    The simple implementation of the scheme is that the master does not have the alternative, after the bad, the Redis cluster can only provide read, and can not be updated, after the main node starts, and then continue the update operation, for the previous update operation, can be cached with MQ, and so on after the main node up and digest the fault during the write request;



    If the official Sentinel will be upgraded from the main upgrade, the overall implementation is relatively complex, need to change the available IP configuration, remove it from the queryable node, so that the front-end of the query load no longer fall on the new master, and then to release Sentinel switch operation, the relationship needs to be guaranteed;

Blocking caused by persistence

Execution persistence (Aof/rdb snapshot) has a significant impact on system performance, especially when there are other read and write disks on the server node (for example, application services and Redis services are deployed on the same node, and application services log in and out of the journal in Real Time). ; You should avoid redis persistence on IO-heavy nodes as much as possible;

When a child process is persisted, the write of the child process and the fsync conflict of the main process cause blocking

In the AOF persistence node, when a child process performs a aof rewrite or an RDB persistence, there is a problem with Redis query lag or even long time blocking, at which time Redis cannot provide any read and write operations;

Cause Analysis:
The Redis service sets the Appendfsync everysec, and the main process calls Fsync () every second, requiring the kernel to "really" write the data to the storage hardware. However, due to the large number of IO operations being performed by the server, the main process fsync ()/operation is blocked, which eventually causes the Redis master process to block.

That's what redis.conf says:
When the AOF fsync policy was set to always or everysec, and a background
Saving process (a background save or AOF log background rewriting) is
Performing a lot of I/O against the disk, in some Linux configurations
Redis may block too long on the Fsync () call. Note that there are no fix for
This currently, as even performing fsync in a different thread would block
Our synchronous write (2) call.
There will be a lot of Io when performing aof overrides, which will cause the main process to block Fsync in some Linux configurations;

Solution:
Set No-appendfsync-on-rewrite Yes, the main process does not invoke the Fsync () operation when the child process performs a aof override, and note that even if the process does not call Fsync (), the system kernel writes the data to the hard disk (Linux) according to its own algorithm at the appropriate time The default is no longer than 30 seconds).
The problem with this setting is that when a failure occurs, the maximum possible loss of data over 30 seconds is no longer 1 seconds;

When a child process aof overrides, the system's sync causes the write blocking of the main process

Let's comb the following:
1) Cause: There is a large number of IO operations write (2) but the synchronization operation is not actively invoked
2) cause a large amount of dirty data in kernel buffer
3) Sync time is too long when the system is synchronizing
4) causes the Redis to write aof log write (2) operation blocking;
5) The next event of a single-threaded redis cannot be processed, and the entire Redis block (Redis event processing is performed in a thread where write (2) of the AOF log is a synchronous blocking mode call, which distinguishes it from the non-blocking write (2) of the network)

Cause 1): This is the problem before redis2.6.12, AOF rewrite has been buried in the call write (2), by the system to trigger sync.
Another reason: The system io is busy, such as different applications in writing disk;

Solution:
Controls the time the system sync is called, the amount of data that needs to be synchronized, time-consuming, and time-consuming, controlling the volume of data per synchronization, by configuring proportionally (Vm.dirty_background_ratio) or by value (vm.dirty_bytes) Set the call threshold for sync, (typically set to 32M sync once)
After 2.6.12, AOF rewrite 32M will be active call Fdatasync;

In addition, Redis does not call write (2) when it discovers that the file currently being written is executing fdatasync (2), only the cache is present, lest it be block. However, if it has been this way for more than two seconds, write (2) will be forcibly executed, even if Redis is blocked by block.

AOF blocking when merging data after rewriting is complete

In the bgrewriteaof process, all new write requests will still be written to the old aof file and placed in aof buffer, and when the rewrite is complete, it will be rename into a new aof file after the main thread merges the content into the temporary file. So the rewrite process will continue to print "Background AOF buffer size:80 MB, Background AOF buffer size:180 MB" to monitor this section of the log. This merging process is blocked, and if a 280MB buffer is produced, REDIS will block for 2.8 seconds on 100MB/S's traditional hard drive;

Solution:
Set the hard disk large enough to increase the AOF override threshold to ensure that no rewrite action is triggered during peak periods, and to invoke aof rewrite commands at idle time using Crontab;

Reference:
Http://www.oschina.net/translate/redis-latency-problems-troubleshooting
Https://github.com/springside/springside4/wiki/redis

Posted by: Big CC | 10dec,2015
Blog: blog.me115.com [Subscribe]
Github: Big cc

Single-threaded you do not block, Redis delay problem analysis and response

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.