Remember once tcplistenoverflows alarm resolution process

Source: Internet
Author: User


Problem description


2015-06-25, 21:33 in the evening received an alarm, as follows:


At this time, the landing server, with Curl Check, found that the service reported 500 errors, can not provide services normally.


Problem handling


Tail a variety of logs, jstat look at the GC, can not quickly locate the problem, so dump memory and thread stack after restarting the application.

Jps-v, find the process ID

Jstack-l PID > 22-31.log

Jmap-dump:format=b,file=22-29.bin PID


Tcplistenoverflows


The ability of the application to handle network requests is determined by two factors:

1, the application of the OPS capacity (in this case is our Jetty application: Controller and Thrift processing power)

2, the length of the socket waiting queue (this is OS-level, Cat/proc/sys/net/core/somaxconn can be viewed, the default is 128, can be tuned to 4192, some companies will make 32768)

When both capacity is full, the application will not be able to provide services, Tcplistenoverflows began to count, Zabbix monitoring set the >5 alarm, so they received the alarm text messages and mail.

This scenario, if we go to the server to see the listen situation, watch "netstat-s | grep Listen ", you will see the" XXX times the Listen queue of a socket overflowed ", and this xxx is constantly increasing, this xxx is we do not have the number of normal processing of network requests.


Reference article:

Something about the TCP listen queue

How to tell if a user request is discarded

The backlog in Linux is detailed

The parameter backlog for the listen of the socket function under Linux

TCP SNMP Counters

Linux under Fix netstat view time_wait state too many issues


Having understood the above, we can already think that the root of the problem is inadequate application processing capacity. The following problem analysis steps can continue to substantiate this.


Problem analysis


Thread Stacks


First look at the line stacks, about 12,000 threads, a large number of threads are time_wait/wait at different addresses, I have multiple threads by the same address WAIT for the case, but can not find the address of the program is running, it seems that the thread stack is of little significance.

In this regard, also ask a master to further help analyze, whether this file can be directly located problems.


Eclipse Memory Analyzer


Mat analysis tool, analysis JVM memory dump file,: http://www.eclipse.org/mat/downloads.php.

Through analysis, we can see that the most in-memory classes are socket-related, as follows:


Shallow Heap & retained heap


Zabbix Monitoring




Problem solving


1, apply for two new virtual machines, put on the load.

2, jetty tuning, increase the number of threads, MaxThreads set to 500.

3, call the external interface timeout time, unified adjustment for 3 seconds, 3 seconds front end will time out, continue to let the user go other, so our back-end process continues to deal with meaningless.

Remember once tcplistenoverflows alarm resolution process

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.