One TcpListenOverflows alarm solution process, notebook anti-theft alarm

Last Update:2015-06-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Problem description

, An alarm was reported at, as follows:

At this time, log on to the server and use curl to check whether the service reports a 500 error and cannot provide services normally.

Troubleshooting

Tail logs. jstat does not quickly locate the problem when it looks at GC, so it dumps the memory and thread stack and then restarts the application.

Jps-v, find the Process ID

Jstack-l PID> 22-31.log

Jmap-dump: format = B, file = 22-29.bin PID

TcpListenOverflows

The application's ability to process network requests is determined by two factors:

1. OPS capacity of the application (in this example, the processing capacity of our jetty application: controller and thrift)

2. Length of the Socket waiting queue (this is OS-level. You can view the length of cat/proc/sys/net/core/somaxconn. The default value is 128, which can be adjusted to 4192, some companies will generate 32768)

When the two capacities are full, the application will not be able to provide services normally. TcpListenOverflows will start to count, and zabbix monitoring will set to> 5 to send alarms, so it will receive alert text messages and emails.

In this scenario, if we look at the listen situation on the server, watch "netstat-s | grep listen" will see "xxx times the listen queue of a socket overflowed ", in addition, this xxx is constantly increasing. This xxx is the number of times we have not processed network requests normally.

References:

Something about tcp listen queue

How to determine whether user requests are lost

Detailed explanation of the backlog in linux

The listen parameter backlog of the socket function in linux

Tcp snmp counters

Solve the Problem of too many netstat views of TIME_WAIT status in LINUX.

After understanding the above, we can roughly think that the root cause of the problem is that the application processing capability is insufficient. The following problem analysis steps can be further proven.

Problem Analysis

Thread Stack

First, let's look at the thread stack. There are more than 12000 threads, and a large number of threads are placed at different addresses by TIME_WAIT/WAIT. Sometimes multiple threads are WAIT at the same address, but none of them can find the program running at this address. It seems that this thread stack is of little significance.

For this reason, you can further analyze whether the problem can be located directly through this file.

Eclipse Memory Analyzer

MAT analysis tool, analysis JVM memory dump file,: http://www.eclipse.org/mat/downloads.php.

Through analysis, we can see that the most classes in memory are socket-related, as shown below:

Shallow heap & Retained heap

Zabbix monitoring

Problem Solving

1. Apply for two new VMS and attach them to the server load.

2. Tune Jetty and increase the number of threads. Set maxThreads to 500.

3. the Timeout time for calling the external interface is adjusted to 3 seconds in a unified manner. After 3 seconds, the front-end will time out and the user will continue to take another step. Therefore, it is meaningless for Our backend process to continue processing.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

One TcpListenOverflows alarm solution process, notebook anti-theft alarm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

One TcpListenOverflows alarm solution process, notebook anti-theft alarm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support