One TcpListenOverflows alarm solution process, notebook anti-theft alarm

Source: Internet
Author: User

One TcpListenOverflows alarm solution process, notebook anti-theft alarm


Problem description


, An alarm was reported at, as follows:


At this time, log on to the server and use curl to check whether the service reports a 500 error and cannot provide services normally.


Troubleshooting


Tail logs. jstat does not quickly locate the problem when it looks at GC, so it dumps the memory and thread stack and then restarts the application.

Jps-v, find the Process ID

Jstack-l PID> 22-31.log

Jmap-dump: format = B, file = 22-29.bin PID


TcpListenOverflows


The application's ability to process network requests is determined by two factors:

1. OPS capacity of the application (in this example, the processing capacity of our jetty application: controller and thrift)

2. Length of the Socket waiting queue (this is OS-level. You can view the length of cat/proc/sys/net/core/somaxconn. The default value is 128, which can be adjusted to 4192, some companies will generate 32768)

When the two capacities are full, the application will not be able to provide services normally. TcpListenOverflows will start to count, and zabbix monitoring will set to> 5 to send alarms, so it will receive alert text messages and emails.

In this scenario, if we look at the listen situation on the server, watch "netstat-s | grep listen" will see "xxx times the listen queue of a socket overflowed ", in addition, this xxx is constantly increasing. This xxx is the number of times we have not processed network requests normally.


References:

Something about tcp listen queue

How to determine whether user requests are lost

Detailed explanation of the backlog in linux

The listen parameter backlog of the socket function in linux

Tcp snmp counters

Solve the Problem of too many netstat views of TIME_WAIT status in LINUX.


After understanding the above, we can roughly think that the root cause of the problem is that the application processing capability is insufficient. The following problem analysis steps can be further proven.


Problem Analysis


Thread Stack


First, let's look at the thread stack. There are more than 12000 threads, and a large number of threads are placed at different addresses by TIME_WAIT/WAIT. Sometimes multiple threads are WAIT at the same address, but none of them can find the program running at this address. It seems that this thread stack is of little significance.

For this reason, you can further analyze whether the problem can be located directly through this file.


Eclipse Memory Analyzer


MAT analysis tool, analysis JVM memory dump file,: http://www.eclipse.org/mat/downloads.php.

Through analysis, we can see that the most classes in memory are socket-related, as shown below:


Shallow heap & Retained heap


Zabbix monitoring




Problem Solving


1. Apply for two new VMS and attach them to the server load.

2. Tune Jetty and increase the number of threads. Set maxThreads to 500.

3. the Timeout time for calling the external interface is adjusted to 3 seconds in a unified manner. After 3 seconds, the front-end will time out and the user will continue to take another step. Therefore, it is meaningless for Our backend process to continue processing.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.