Nginx Proxy timeout troubleshooting

Source: Internet
Author: User

Nginx Proxy timeout troubleshooting

I. Environment

The current environment is nginx as the front-end reverse proxy, and upstream is two tomcat servers.

Ii. Cause

Since the recent project is in the initial stage, I have worked overtime on a daily basis. I just had no problem with the time of the day. I packed up my equipment and took the subway to my house early.

At this time, I listened to the music and sat happily on the subway. Suddenly the music stopped and the bell of the call sounded. A bad hunch arises, and it seems that there is a problem. As a result, I took out the phone and saw the name of our boss flashing on the screen of the mobile phone. I took a deep breath and got on the phone. We heard our boss say that the client has reported an error or something. Because the subway system is very noisy and the signal is not very good, I did not ask for details. If there is a problem on the server side, I will answer it first. At this time, I had not arrived home, so I had to look at it again. So the boss hung up the phone. I thought about what I was talking about, but I still didn't hear what I said after a long time.

Iii. troubleshooting

When I got home, turned on my computer and boarded QQ, I saw the problem reported by the client in the group. It showed that some resources on the request server experienced the request timed out. So I log on to the server. First, check whether each process is normal. After confirming that the operation is normal, the access page test is normal. Because the client calls an interface file, the browser cannot directly Test Access and can only view problems through nginx logs.

The client tests the https protocol. In this case, you can view some https logs:

19:15:29 [error] 17709 #0: * 1380648 upstream timed out (110: Connection timed out) while sending request to upstream, client: xxx, server: www.xxxx.com, request: "POST/xxx/pub/xxx. do HTTP/1.1 ", upstream:" http: // xxxx: 8082/xxx. do ", host:" www.xxxx.com: 443"

19:16:11 [error] 17709 #0: * 1380648 upstream timed out (110: Connection timed out) while sending request to upstream, client: xxx, server: www.xxxx.com, request: "POST/xxx/pub/xxx. do HTTP/1.1 ", upstream:" http: // xxxx: 8082/xxx. do ", host:" www.xxxx.com: 443"

19:17:29 [error] 17709 #0: * 1380648 upstream timed out (110: Connection timed out) while sending request to upstream, client: xxx, server: www.xxxx.com, request: "POST/xxx/pub/xxx. do HTTP/1.1 ", upstream:" http: // xxxx: 8082/xxx. do ", host:" www.xxxx.com: 443"

19:29:29 [error] 17709 #0: * 1380648 upstream timed out (110: Connection timed out) while sending request to upstream, client: xxx, server: www.xxxx.com, request: "POST/xxx/pub/xxx. do HTTP/1.1 ", upstream:" http: // xxxx: 8082/xxx. do ", host:" www.xxxx.com: 443"

The log format is similar to the above class content. When I see this, I think it may be due to some timeout issues of the nginx proxy, So I modified the time out settings.

This problem still exists in the test again. I thought about it carefully and suddenly found that https was tested here, And I modified it as if it was just the http timeout time, So I modified it again, I was excited and thought it was okay. I did not expect the client to have the same test problem after the modification. This is a bit depressing.

As I monitor the background logs in real time, I find that although an error is reported again, there are some differences:

19:47:31 [error] 17708 #0: * 1381368 recv () failed (104: Connection reset by peer) while reading response header from upstream, client: xxx. xxx. xxx. xxx, server: www.xxx.com, request: "POST/xxx/xxxx. do HTTP/1.1 ", upstream:" http://xxx.xxx.xxx: 8082/xxx/home/xxx. do ", host :"

19:50:11 [error] 12300 #0: * 1381368 recv () failed (104: Connection reset by peer) while reading response header from upstream, client: xxx. xxx. xxx. xxx, server: www.xxx.com, request: "POST/xxx/xxxx. do HTTP/1.1 ", upstream:" http://xxx.xxx.xxx: 8082/xxx/home/xxx. do ", host:" www.xxx.com: 443"

19:55:04 [error] 132648 #0: * 1381368 recv () failed (104: Connection reset by peer) while reading response header from upstream, client: xxx. xxx. xxx. xxx, server: www.xxx.com, request: "POST/xxx/xxxx. do HTTP/1.1 ", upstream:" http://xxx.xxx.xxx: 8082/xxx/home/xxx. do ", host:" www.xxx.com: 443"

The error code and error message in the log have changed. As prompted, upstream sent a reset request to nginx. Why?

I searched the internet and found that most of them were about time out. Some of them said that the client's get header was too large, but it was obvious that the POST method was used here. Check the problem according to time out, and then ask the client to check whether the timeout period is set. The client sets the timeout period of 10 s. It seems that there should be no problem. 10 s should be enough for the server to process and respond.

At this time, I have not thought about the problem. I can only write more logs for monitoring, therefore, the http access logs and error logs, as well as the https access logs and error logs are monitored in real time in tail mode. In these log monitoring, we suddenly found some strange phenomena:

3.1. connection time out also occurs in http error logs.

19:29:44 [error] 17708 #0: * 1380926 upstream timed out (110: Connection timed out) while reading response header from upstream, client: xxx. xxx. xxx. xxx, server: www.xxxx.com, request: "GET/xxx/xxx.png HTTP/1.1", upstream: "http://xxx.xxx.xxx: 8082/xxx/image/xxx.png", host: "www.xxx.com", referrer: "http://www.xxx.com/xxx/xxx/xxx.do? User_id = 57 & from = singlemessage & isappinstalled = 1"

19:29:44 [error] 17708 #0: * 1380930 upstream timed out (110: Connection timed out) while reading response header from upstream, client: xxx. xxx. xxx. xxx, server: www.xxx.com, request: "GET/xxx/xxx.png HTTP/1.1", upstream: "http://xx.xxx.xxx.xxx: 8082/xxx/xxx.png", host: "www.xxx.com", referrer: "http://www.xxx.com/xxx/xxx/xxx.do? User_id = 57 & from = singlemessage & isappinstalled = 1"

3.2 https access logs and error logs contain two requests at the same time

# Error Log

21:58:59 [error] 22498 #0: * 527 recv () failed (104: Connection reset by peer) while reading response header from upstream, client: xxx. xxx. xxx. xxx, server: www.xxx.com, request: "POST/xxx. do HTTP/1.1 ", upstream:" http://xxx.xxx.xxx.xxx: 8082/xxx. do ", host:" www.xxx.com: 443"

# Access logs

Xxx. xxx. xxx. xxx--[06/Aug/2015: 21: 58: 59 + 0800] "POST/xxx. do HTTP/1.1 "200 1100"-"" xxx/1.1.0 (iPhone Simulator; iOS 8.1; Scale/2.00 )""-"

The above shows two requests at the same time point, one successful, one failed, and the access log has many 499 Response codes. The 499 response code is/* 499, and the client has closed connection */. That is to say, the client closes the connection actively, or nginx submits the post twice too quickly.

1. The client closes the connection because the connection is closed after the set timeout period. This is back to the 10 s timeout duration and frequent occurrence of time out.

2. When a POST request is submitted too quickly, nginx will deem it an insecure request and refuse the connection. This may be caused by the continuous test data on the client side. In this case, you can configure the following parameters for the nginx configuration file to disable it.

Proxy_ignore_client_abort on;

However, this configuration is insecure. To solve the problem, configure and test with excitement.

Sadly, this parameter is configured and the problem still cannot be solved. The gap in mind occurs again. But it does not matter. After all, we can discover various problems through logs.

After confirming that the second question has been ruled out, return to the first question (that is, the question of 3.1). Why is the https protocol tested on the client side, this problem also occurs in the http protocol. There are many doubts here. So let's start from here.

The interface file tested by the client is stored in the directory of an application. Although you cannot directly access the interface file, you can access the web directory of the application.

The problem is found by accessing the web directory and combining logs:

1. nginx proxy uses the default round robin, so each time it is scheduled to a different backend server. At the moment, when you access the refresh page, one of them gets stuck. You can view the background logs and find that an error is reported every time the page is refreshed.

2. When an error is reported, a successful request will also appear in the normal log. The page is refreshed again. This explains the two requests at the same time point.

3. At this time, you can easily see the problem by looking back at the error log. It is found that all the errors belong to the same tomcat in upstream. It indicates that there is a problem with tomcat.

Drop this tomcat in nginx, and test everything on the client side. Solve the problem. The problem is that tomcat code is directly developed during the day. The problem can only be solved by working during the day.

Iv. Summary

After troubleshooting, We will summarize the following:

1. handle problems calmly and start troubleshooting from the most basic level.

2. Good at Error Tracking by using logs of various applications.

3. You still need to understand the various applications and try to know what the problem is. Where to troubleshoot.

After completing the problem, record it and find that it is not too early. It is time to take a rest and have a better fighting status tomorrow!

For more Nginx tutorials, see the following:

Deployment of Nginx + MySQL + PHP in CentOS 6.2

Build a WEB server using Nginx

Build a Web server based on Linux6.3 + Nginx1.2 + PHP5 + MySQL5.5

Performance Tuning for Nginx in CentOS 6.3

Configure Nginx to load the ngx_pagespeed module in CentOS 6.3

Install and configure Nginx + Pcre + php-fpm in CentOS 6.4

Nginx installation and configuration instructions

Nginx log filtering using ngx_log_if does not record specific logs

Nginx details: click here
Nginx: click here

This article permanently updates the link address:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.