Remember once tcplistenoverflows alarm resolution process

Last Update:2015-06-28 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Problem description

2015-06-25, 21:33 in the evening received an alarm, as follows:

At this time, the landing server, with Curl Check, found that the service reported 500 errors, can not provide services normally.

Problem handling

Tail a variety of logs, jstat look at the GC, can not quickly locate the problem, so dump memory and thread stack after restarting the application.

Jps-v, find the process ID

Jstack-l PID > 22-31.log

Jmap-dump:format=b,file=22-29.bin PID

Tcplistenoverflows

The ability of the application to handle network requests is determined by two factors:

1, the application of the OPS capacity (in this case is our Jetty application: Controller and Thrift processing power)

2, the length of the socket waiting queue (this is OS-level, Cat/proc/sys/net/core/somaxconn can be viewed, the default is 128, can be tuned to 4192, some companies will make 32768)

When both capacity is full, the application will not be able to provide services, Tcplistenoverflows began to count, Zabbix monitoring set the >5 alarm, so they received the alarm text messages and mail.

This scenario, if we go to the server to see the listen situation, watch "netstat-s | grep Listen ", you will see the" XXX times the Listen queue of a socket overflowed ", and this xxx is constantly increasing, this xxx is we do not have the number of normal processing of network requests.

Reference article:

Something about the TCP listen queue

How to tell if a user request is discarded

The backlog in Linux is detailed

The parameter backlog for the listen of the socket function under Linux

TCP SNMP Counters

Linux under Fix netstat view time_wait state too many issues

Having understood the above, we can already think that the root of the problem is inadequate application processing capacity. The following problem analysis steps can continue to substantiate this.

Problem analysis

Thread Stacks

First look at the line stacks, about 12,000 threads, a large number of threads are time_wait/wait at different addresses, I have multiple threads by the same address WAIT for the case, but can not find the address of the program is running, it seems that the thread stack is of little significance.

In this regard, also ask a master to further help analyze, whether this file can be directly located problems.

Eclipse Memory Analyzer

Mat analysis tool, analysis JVM memory dump file,: http://www.eclipse.org/mat/downloads.php.

Through analysis, we can see that the most in-memory classes are socket-related, as follows:

Shallow Heap & retained heap

Zabbix Monitoring

Problem solving

1, apply for two new virtual machines, put on the load.

2, jetty tuning, increase the number of threads, MaxThreads set to 500.

3, call the external interface timeout time, unified adjustment for 3 seconds, 3 seconds front end will time out, continue to let the user go other, so our back-end process continues to deal with meaningless.

Remember once tcplistenoverflows alarm resolution process

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Remember once tcplistenoverflows alarm resolution process

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Remember once tcplistenoverflows alarm resolution process

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support