Php+redis How to troubleshoot HTTP 500:internal Server error in real-world projects

Last Update:2018-05-26 Source: Internet

Author: User

Tags redis version

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The rapid increase in user volume, the number of visits in a short period of time doubled, due to previous capacity planning done better, hardware resources can be supported, but the software system has a big problem: 40% of requests will return HTTP 500:internal Server Error

Problem description
The rapid increase in user volume, the number of visits in a short period of time doubled, due to the earlier capacity planning done better, hardware resources can be supported, but the software system has a big problem:
40% of requests will return HTTP 500:internal Server Error
By viewing the log, the error is found in the connection processing of PHP <-> Redis
Debug processing

1th time
At first, the root cause was not found and only a variety of error-related approaches were attempted, such as:
Increase the number of PHP connections and increase the time-out from 500ms to 2.5s
Disable the default_socket_timeout in PHP settings
Disable SYN cookies in the host system
Check the number of file descriptors for Redis and webservers
Increase the mbuffer of the host system
Adjust the number of TCP backlog
......

Tried many methods, but all were invalid

2nd time
Want to reproduce the problem in the pre-release environment, unfortunately, it is not successful, should be insufficient traffic, can not be reproduced

3rd time
Could it be that the Redis connection was not turned off in the code?
Normally, PHP will automatically close the resource connection at the end of execution, but there will be a memory leak in the old version, and for the sake of insurance, modify the code once and manually close the connection.
The result is still invalid.

4th time
Suspect target: Phpredis this client library
Do A/B testing, replace the library with Predis, deploy to 20% of users in the data center
Thanks to a good code structure, replacement work is done quickly
But the result is still invalid, but there are good side, can prove Phpredis no problem

5th time
Look at the version of Redis, v2.6, when the latest version is v2.8.9
Upgrade Redis Try it.
It's okay to be optimistic, it's not going to make the Redis version up to date.

6th time
By finding a large number of documents, a debugging method is found in the official documentation of Redis software Watchdog, which is opened and executed:

$ redis-cli--latency-p 6380-h 1.2.3.4min:0, max:463, avg:2.03 (19443 samples)

To view the Redis logs:

... [20398] 09:20:55.351 * 10000 changes in seconds. Saving ... [20398] 09:20:55.759 * Background saving started by PID 41941[41941] (May 09:22:48.197 * DB saved on disk[20398] 09:22:49.321 * Background saving terminated with success[20398], may 09:25:23.299 * 10000 changes in seconds. Saving ... [20398] 09:25:23.644 * Background saving started by PID 42027 ...

Found the problem:
Every few minutes to the hard disk to save data, fork a background storage for why it takes about 400ms (through the above log 1th and 2nd time can be seen)

Here, we finally find the source of the problem, because there is a lot of data in the Redis instance, which makes it time-consuming to fork the background process for each persistent operation and often modifies key in their business, causing frequent trigger persistence, and often creating a blocking on Redis

Workaround: Use a separate slave for persistence

This slave does not handle real traffic requests, the only function is to handle persistence and transfer the persisted operations on the previous Redis instance to this slave

The effect is very obvious, the problem is basically solved, but sometimes it will be error

7th time
Troubleshoot slow queries that may block Redis, and find a place to use the keys *

Because the data in Redis is getting more and more, this command will naturally cause serious blockage.

You can use scan to replace

8th Time
After the previous adjustment, the problem has been resolved, the following months, even if the traffic is growing, also anti-live

But they are aware of the new problem:

The way to do this is to create a Redis connection on a request, execute a few commands, and then disconnect, which creates a serious performance waste when the volume of requests is large, and more than half of the commands are used to handle the connection operation, which exceeds the business logic and slows Redis

Workaround: Introduce proxy, they choose the twemproxy of Twitter, only need to install agent on each webserver, Twemproxy is responsible for persistent connection with Redis instance, which greatly reduces the operation of connection.

There are also two convenient places to Twemproxy:

Support memcached
can block very time-consuming or dangerous commands, such as keys, Flushall
The effect is naturally perfect, no longer worry about previous connection errors

9th time
Continue optimization with data sharding:

Data split isolation for different contexts
Consistent hash sharding of data in the same context
Effect:

Reduced requests, loads on each machine
Improves cache reliability without worrying about node failures

The above is the whole content of this article, I hope that everyone's study has helped.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More