Redis (21): redis performance troubleshooting Manual

Last Update:2018-11-02 Source: Internet

Author: User

Tags allkeys redis server

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Performance-related data indicators

Access the redis server through the redis-cli interface, and then use the info command to obtain all information related to the redis service. This information is used to analyze some performance indicators mentioned later in the article.

The data output by the info command can be divided into 10 categories:

Server
Clients
Memory
Persistence
Stats
Replication
CPU
Commandstats
Cluster
Keyspace

This article mainly introduces two important performance indicators: memory and stats.

Note that the information returned by the info command does not have the data information related to the command response latency. Therefore, we will detail how to obtain latency-related data indicators later.

If you think that info outputs too much information and is disorganized, you can specify the info command parameter to obtain data under a single category. For example, if you enter the info MEMORY command, only memory-related data is returned.

In order to quickly locate and solve performance problems, five key data indicators are selected here, which include performance problems that most people often encounter when using redis.

Memory usage used_memory

The used_memory field data in indicates the total memory allocated by the redis distributor, in bytes. The data on used_memory_human is the same as that on used_memory, which is displayed in m units for ease of reading.

Used_memory is the total memory used by redis. It contains the memory occupied by the actual cache and the memory occupied by redis's own operations (such as metadata and LUA ). It is the memory allocated by redis using the memory distributor, so this data does not count the memory fragments wasted.

The meanings of other fields are measured in bytes:

Used_memory_rss: displays the total allocated memory on the operating system.
Mem_fragmentation_ratio: memory fragmentation rate.
Used_memory_lua: memory size used by the Lua script engine.
Mem_allocator: indicates the memory distributor used by redis during compilation. It can be libc, jemalloc, or tcmalloc.

Performance problems caused by memory Switching

Memory usage is the most critical part of the redis service. If the memory usage of a redis instance exceeds the maximum available memory (used_memory> maximum available memory), the operating system starts to swap the memory with the swap space, write the old or unused content in the memory to the hard disk (this space on the hard disk is called the swap partition) to free up new physical memory for new pages or active pages.
The read/write operation on the hard disk is more slow than the read/write operation on the memory, which is nearly five orders of magnitude. The memory is 0.1 μs, and the hard disk is 10 ms. If memory exchange occurs in the redis process, redis and applications dependent on redis data will be severely affected by performance. You can view the used_memory indicator to know the memory usage of redis. If used_memory> has the maximum available memory, it indicates that the redis instance is performing memory swap or the memory swap has been completed. The Administrator takes emergency measures based on the situation.

Tracking memory usage

If the RDB snapshot or aof persistence policy is not enabled during redis usage, the cached data may be lost when redis crashes. When the redis memory usage exceeds 95% of the available memory, some data is exchanged back and forth between the memory and the swap space, which may cause data loss.
When the snapshot function is enabled and triggered, redis will fork a sub-process to completely copy the data in the current memory and write it to the hard disk. Therefore, if the snapshot function is triggered when the memory usage exceeds 45% of the available memory, the memory swap will be very dangerous (data may be lost ). If there are a large number of frequent update operations on the instance at this time, the problem will become more serious.

You can avoid this problem by reducing redis memory usage, or use the following techniques to avoid Memory switching:

If the cached data is less than 4 GB, a 32-bit redis instance is used. Because the pointer size of a 32-bit instance is only half the size of a 64-bit instance, its memory space will be less occupied. The disadvantage is that if the physical memory exceeds 4 GB, the memory available for 32-bit instances will still be limited to 4 GB or less. If the instance is shared with other applications at the same time, a more efficient 64-bit redis instance may be required. In this case, switching to 32-bit is not desirable. No matter which method is used, redis dump files are compatible with each other between 32-bit and 64-bit. Therefore, if you need to reduce the memory usage, try to use 32-bit first, then switch to 64-bit.
Use the hash data structure as much as possible. Because redis stores less than 100 fields in the hash structure, its storage efficiency is very high. Therefore, when you do not need a set operation or a list push/pop operation, use the hash structure as much as possible. For example, in a web application, you need to store an object to represent user information and use a single key to represent a user. Each of its attributes is stored in the hash field, this is more efficient than setting a key-value for each attribute. Generally, if data is stored in a String Structure and multiple keys are used, the data should be converted to a hash structure with single key and multiple fields. As described in the preceding example, the hash structure should contain the attributes of a single object or various materials of a single user. The operation commands of the hash structure are hset (Key, fields, value) and hget (Key, field), which can be used to store or retrieve the specified field from the hash.
Set the key expiration time. One simple way to reduce memory usage is to ensure that the key expiration time is set whenever the object is stored. If the key is used in a specific period or the old key is unlikely to be used, you can use the redis expiration time command (expire, expireat, pexpire, pexpireat) to set the expiration time, in this way, redis will automatically delete the key when the key expires. If you know how many new key-values are created per second, you can adjust the key survival time and specify the threshold value to limit the maximum memory used by redis.
Reclaim key. In the redis configuration file (usually redis. conf), you can set the value of the "maxmemory" attribute to limit the maximum memory used by redis. After modification, restart the instance to take effect. You can also use the Client Command config set maxmemory to modify the value. This command takes effect immediately, but will expire after restart. You need to use the config rewrite command to refresh the configuration file. If the redis snapshot function is enabled, set "maxmemory" to 45% of the memory available for the system, because the snapshot requires twice the memory to copy the entire dataset, that is to say, if 45% is used, it will change to 95% (45% + 45% + 5%) during the snapshot period, of which 5% is reserved for other expenses. If the snapshot function is not enabled, maxmemory can be set to 95% of the available memory.

When the memory usage reaches the set maximum threshold value, you need to select a key recycling policy. You can modify the "maxmemory-policy" attribute value in the redis. conf configuration file. If keys in the redis dataset have an expiration time set, the "volatile-TTL" policy is a good choice. However, if the key does not expire quickly when the maximum memory limit is reached, or the expiration time is not set at all. The value "allkeys-LRU" is suitable, which allows redis to select the least recently used key from the entire data set for deletion (LRU elimination algorithm ). Redis also provides some other elimination strategies, as follows:

Volatile-LRU: uses the LRU algorithm to remove data from a dataset with an expiration time.
Volatile-TTL: selects expired data from a dataset with an expiration time set.
Volatile-random: randomly selects data from a dataset with an expiration time.
Allkeys-LRU: uses the LRU algorithm to remove data from all datasets.
Allkeys-random: selects any data from the dataset for elimination.
No-enviction: Disable data elimination.

By setting maxmemory to 45% or 95% of the available system memory (depending on the persistence Policy) and set "maxmemory-policy" to "volatile-TTL" or "allkeys-LRU" (depending on the expiration settings) to limit the maximum memory usage of redis more accurately, using these two methods in most scenarios ensures that redis does not exchange memory. If you are worried about data loss due to limited memory usage, you can set the noneviction value to prevent data elimination.

Total_commands_processed

The total_commands_processed field in info shows the total number of redis service processing commands. The commands are all requested from one or more redis clients. Redis processes commands from client requests every moment. It can be any of the 140 commands provided by redis. The value of the total_commands_processed field is incremental. For example, if the redis service processes two commands from the client_x request and three commands from the client_y request respectively, the total number of commands processed (total_commands_processed) 5 is added.

Analyze the total number of command processes to diagnose response latency.

In redis instances, the total number of TRACE command processing is the most critical part to solve the response latency problem. Because redis is a single-threaded model, commands from the client are executed in order. The most common latency is bandwidth. The latency through a gigabit Nic is about 200 μs. If the response time of the command is obviously slow and the latency is higher than 200 μs, it may be that there are a large number of commands waiting for processing in the redis command queue. As mentioned above, the slow response time may be caused by one or more slow commands. At this time, we can see that the number of command processes per second is significantly decreasing, and even the subsequent commands are completely blocked, this reduces redis performance. To solve this performance problem, you need to track the number of command processes and the delay time.
For example, you can write a script to regularly record the value of total_commands_processed. When the client obviously finds that the response time is too slow, you can use the recorded total_commands_processed historical data value to determine whether the total number of numerics is increasing or decreasing, so as to troubleshoot the problem.

Use the command to handle the increase in latency.

By comparing with the recorded historical data, it is found that the total number of command processing operations is indeed in the ascending or descending state, which may be caused by two reasons:

The number of commands in the command queue is too large, and the subsequent commands are always waiting.
Several slow commands block redis.

There are three solutions to solve the problem of response latency caused by the above two reasons.

Use multi-parameter commands: if the client sends a large number of commands within a short period of time, it will find that the response time is obviously slow, because the subsequent commands have been waiting for the execution of a large number of commands in front of the queue. There is a way to improve the latency problem, that is, to replace the form of multiple commands and single parameters with a single command and multiple parameters. For example, if you cyclically use the lset command to add 1000 elements to the list structure, the performance is poor. A better way is to create a list of 1000 elements on the client, use a single command, lpush or rpush, to construct a redis service that sends 1000 elements at a time using multiple parameters. The following table lists some redis operation commands, including single parameter commands and commands that support multiple parameters. Using these commands, You can minimize the number of times you use multiple commands.
Pipeline command: another method to reduce multiple commands is to use pipelines to merge several commands for execution, thus reducing the latency caused by network overhead. Because sending 10 commands to the server alone results in 10 network latency overhead, using pipelines will return the execution result at one time, requiring only one network latency overhead. Redis itself supports pipeline commands, which are also supported by most clients. If the latency of the current instance is obvious, it is very effective to use pipelines to reduce the latency.
Avoid slow commands in a large set of operations: If the processing frequency of a command is too low, this may be caused by the use of command operations with high time complexity, this means that the time for each command to obtain data from the collection increases. Therefore, reducing the use of complex commands with high time can significantly improve the performance of redis. The following table lists commands with high time complexity. It describes the command attributes in detail, which helps you use these commands efficiently and rationally (if you have to use them ), to improve redis performance.

Delay Time

Redis delayed data cannot be obtained from Info information. If you want to view the delay time, you can use the redis-cli tool to add the -- latency parameter for running, such:

Redis-cli --latency -h 127.0.0.1 -p 6379

Its host and port are the IP address and port of the redis instance. Due to the different running conditions of the current server, the latency may be different. Generally, the latency of 1g Nic is 200 μs.

The response latency of apsaradb for redis is measured in milliseconds. The local latency is 300 μs:

Tracking redis latency performance

One of the main reasons why redis is so popular is the high performance brought about by the low latency feature, so solving the latency problem is the most direct way to improve redis performance. For 1g bandwidth, if the latency is much higher than 200 μs, it is obviously a performance problem. Although there are some slow I/O operations on the server, redis accepts requests from all clients on a single core, and all requests are queued for execution in good order. Therefore, if a command sent from a client is a slow operation, all other requests must wait until it is completed before execution can continue.

Use latency commands to improve performance

Once the delay time is determined to be a performance problem, there are several ways to analyze and solve the performance problem.

1. Use slowlog to find the slow command that causes latency:The slowlog command in redis allows us to quickly locate slow commands that exceed the specified execution time. By default, if the command execution time exceeds 10 ms, logs will be recorded. Slowlog only records the execution time of its command, does not include IO round-trip operations, and does not record the slow response caused by network latency. Generally, the network latency of 1 GB Bandwidth is expected to be around 200 μs. If the execution time of a command is more than 10 ms, it is nearly 50 times slower than the network latency. To view all commands with slow execution time, run the slowlog GET command using the redis-cli tool. The third field in the returned result is displayed in a subtle unit. If you only need to view the last 10 slow commands, enter slowlog get 10. For more information about how to locate latency issues caused by slow commands, see the total_commands_processed introduction section.

The fields in the figure respectively mean:

1 = unique log identifier
2 = execution time of the recorded command, expressed in UNIX timestamp format
3 = query execution time, in microseconds. In this example, the command is 54 milliseconds.
4 = the executed commands are arranged in arrays. The complete command is config get *.

If you want to customize the standard of slow commands, you can adjust the threshold value that triggers the log to record slow commands. If there are few or no commands that exceed 10 ms, to reduce the recorded threshold value, for example, 5 ms, enter the following command configuration in the redis-cli tool:

config set slowlog-log-slower-than 5000

You can also set it in the redis. config configuration file, in a subtle bit.

2. Monitoring client connection:Because redis is a single-threaded model (only one core can be used) to process requests from all clients, but due to the increase in the number of client connections, the thread resource for processing the request starts to reduce the processing time allocated to a single client connection. At this time, each client needs to spend more time waiting for the response of the redis shared service. In this case, it is very important to monitor the number of client connections, because the number of connections created by the client may exceed the expected number, or the client does not effectively release the connection. Enter info clients in the redis-cli tool to view all client connection information of the current instance. For example, the first field (connected_clients) shows the total number of client connections to the current instance:

By default, redis allows a maximum of 10000 client connections. If the number of connections exceeds 5000, redis performance may be affected. If some or most clients send a large number of commands, this number will be much lower.

3. restrict the number of client connections:After redis2.6, users are allowed. conf) modify the maximum number of client connections on the maxclients attribute, or enter config set maxclients on the redis-cli tool to set the maximum number of connections. Based on the number of connections load, this number should be set to between 110% and 150 of the expected peak number of connections. If the number of connections exceeds this number, redis rejects and immediately closes the new connection. It is important to set the maximum number of connections to limit the unexpected increase in the number of connections. In addition, if a new connection fails, an error message is returned, which allows the client to know that redis has an unexpected number of connections, so as to implement corresponding measures. The two methods above are very important for controlling the number of connections and continuously maintaining the optimal performance of redis,

4. enhance memory management:A small amount of memory will increase the redis latency. If the memory occupied by redis exceeds the available memory of the system, the operating system will swap some data of the redis process from the physical memory to the hard disk, and the memory swap will significantly increase the delay time. For more information about how to monitor and reduce memory usage, see used_memory.

5. performance data indicators:

To solve redis performance problems, we usually need to associate the data changes with other performance indicators. The decrease in the total number of command processing may occur because the slow command blocks the entire system. However, if the total number of command processing increases, the memory usage also increases, this may be caused by memory swap performance problems. For the association analysis of such performance indicators, we need to observe important changes of data indicators from historical data. In addition, we can also observe all other performance indicators associated with a single performance indicator. The data can be collected on redis. The periodic calling content is the redis info script, and the output information is analyzed and recorded in the log file. When latency changes, log files are used together with other data indicators to connect the data to locate the problem.

Memory fragmentation Rate

Mem_fragmentation_ratio in info provides the data indicator of memory fragmentation rate, which is obtained by dividing the memory allocated by redis by the memory allocated by the operating system:

The memory allocated by used_memory and used_memory_rss numbers are as follows:

User-Defined data: memory is used to store key-value values.
Internal Overhead: stores internal redis information to indicate different data types.

The RSS of used_memory_rss is short for resident set size, which indicates the size of the physical memory occupied by the process and the memory size allocated to the redis instance by the operating system. In addition to user-defined data and internal overhead, the used_memory_rss indicator also contains memory fragment overhead, which is caused by inefficient allocation/recovery of physical memory by the operating system.
The operating system allocates physical memory to various application processes. The ing between redis memory and physical memory is completed by the virtual memory management distributor on the operating system.
For example, redis needs to allocate continuous memory blocks to store 1 GB of data sets. This is more advantageous, but the physical memory may not exceed 1 GB of continuous memory blocks, then the operating system has to use multiple non-consecutive small memory blocks to allocate and store this 1 GB of data, resulting in the generation of memory fragments.
Another complex aspect of the memory distributor is that it often pre-allocates some memory blocks for reference, which will speed up application running.

Understanding resource performance

Tracking the memory fragmentation rate is very important for understanding the resource performance of redis instances. It is reasonable that the memory fragmentation rate is slightly greater than 1. This value indicates that the memory fragmentation rate is relatively low, which also indicates that there is no memory swap in redis. However, if the memory fragmentation rate exceeds 1.5, it indicates that redis consumes 150% of the actual physical memory, of which 50% is the memory fragmentation rate. If the memory fragmentation rate is lower than 1, the memory allocation of redis exceeds the physical memory, and the operating system is switching the memory. Memory switching can cause a very obvious response delay. For details, refer to the used_memory introduction section.

The value 0.99 is 99%.

Prediction of performance problems with memory fragmentation Rate

If the memory fragmentation rate exceeds 1.5, it may be the performance of memory management deterioration in the operating system or redis instance. There are three ways to solve the problem of poor memory management and improve redis performance:

1. Restart the redis Server:If the memory fragmentation rate exceeds 1.5, restarting the redis server can invalidate the extra memory fragments and use them as new memory, restoring the operating system to efficient memory management. The extra fragments are generated because redis releases the memory block, but the memory distributor does not return the memory to the operating system, which is specified during compilation, it can be libc, jemalloc, or tcmalloc. Compare the data values of used_memory_peak, used_memory_rss, and used_memory_metrics to check the usage of additional memory fragments. We can see from the name that used_memory_peak is the memory usage peak of redis in the past, rather than the current memory usage value. If the values of used_memory_peak and used_memory_rss are roughly the same, and the values obviously exceed the value of used_memory, this indicates that additional memory fragments are being generated. Enter info memory on the redis-cli tool to view the information of the above three metrics:

Before restarting the server, you need to enter the shutdown save command on the redis-cli tool, which means to force the redis database to perform the save operation and disable the redis service, this ensures that no data is lost when redis is disabled. After the restart, redis will load persistent files from the hard disk to ensure the continuous availability of the dataset.

2. Limited Memory switching:If the memory fragmentation rate is lower than 1, The redis instance may swap some data to the hard disk. Memory swap will seriously affect the performance of redis, so you should increase the available physical memory or reduce the memory usage of real redis. You can view the optimization suggestions in used_memory.

3. Modify the memory distributor:
Redis supports glibc 'smalloc, jemalloc11, and tcmalloc memory splitters, each of which has different implementations in memory allocation and fragmentation. It is not recommended that the general administrator modify the redis default memory distributor, because this requires a full understanding of the differences between these memory distributors and re-compile redis. This method is more about the redis memory distributor, and of course it is also a way to improve the memory fragmentation problem.

Reclaim key

The evicted_keys field in info displays the number of keys deleted due to maxmemory restrictions. For more information about maxmemory, see the previous chapter. when the key is recycled, the memory exchange only occurs when the maxmemory value is set. When redis needs to recycle a key due to memory pressure, redis first considers not to recycle the oldest data, but to randomly select a key among the least recently used key or the key to be expired, delete from the dataset.

You can set the maxmemory-policy value to "volatile-LRU" or "volatile-TTL" in the configuration file to determine whether redis uses the LRU policy or the expiration time policy. If all keys have a specific expiration time, it is more appropriate to reclaim the expiration time. If the key expiration time is not set or there is not enough expiration key, it is reasonable to set the LRU policy, which can recycle the key without considering its expiration status.

Locate performance issues based on key collection

Collection of Trace keys is very important because the collection of keys ensures reasonable allocation of redis memory resources. If the evicted_keys value often exceeds 0, the client's command response latency increases, because redis not only needs to process command requests from the client, but also frequently recycles keys that meet the conditions.
It should be noted that the effect of recycle key on performance is far from that of memory swap. It is reasonable to choose to set a recycle policy if you want to set a forced memory swap and a recycle policy, switching memory data to a hard disk has a significant impact on performance (see the previous section ).

Remove keys to improve performance

Reducing the number of recycled keys is a direct way to improve redis performance. There are two ways to reduce the number of recycled keys:

1. added memory limit:If the snapshot function is enabled, maxmemory needs to be set to 45% of the physical memory, which is almost no risk of memory switching. If the snapshot function is not enabled, it is reasonable to set 95% of the available memory of the system. For more information, see the snapshot and maxmemory restrictions section. If maxmemory is set to lower than 45% or 95% (depending on the persistence policy), you can increase the value of maxmemory to allow redis to store more keys in the memory, which can significantly reduce the number of keys recycled. If maxmemory has been set as the recommended threshold, increasing the maxmemory limit will not improve the performance, but will lead to memory switching, resulting in increased latency and reduced performance. You can enter the config set maxmemory command on the redis-cli tool to set the maxmemory value.
Note that this setting takes effect immediately, but is lost after restart. If you need to save it for a long time, then enter the config rewrite command to refresh the new configuration in the memory to the configuration file.

2. Slice the instance:Sharding divides data into appropriate sizes and stores them on different redis instances. Each instance contains part of the entire dataset. Through sharding, many servers can be combined to store data, which is equivalent to increasing the total physical memory so that more keys can be stored without memory switching and key revocation policies. If there is a very large data set, maxmemory has been set, and the actual memory usage has exceeded the recommended threshold value, the data sharding can significantly reduce key recycling, this improves the performance of redis. There are many methods to implement sharding. Below are several common methods to implement sharding in redis:

A. Hash sharding: A simple implementation method. The hash function is used to calculate the hash value of the key, and the value range corresponds to a specific redis instance.
B. Proxy sharding: the client sends the request to the proxy. The proxy selects the corresponding redis instance through the shard configuration table. For example, Twitter's twemproxy and CODIS of pods.
C. Consistent hash sharding: see the previous blog "consistent hash sharding details".
D. Virtual bucket sharding: see "virtual bucket sharding details" in the previous blog

Summary

For developers, redis is a very fast key-value memory database and provides convenient API interfaces. To make the best use of redis, you need to understand which factors affect redis performance and which data indicators can help us avoid performance traps. Through this article, we can understand the important performance indicators in redis, how to view them, and more importantly, how to use the data to solve redis performance problems.

This blog mainly translates the middle 15 pages of an ebook. The address of the ebook is ghost.

Redis (21): Manual for troubleshooting redis performance problems (transfer)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More