Linux lower Port close_wait too much __linux

Source: Internet
Author: User
Scenario Description: System produces a large number of "Too Many open files"
Reason analysis: In the server and the client communication process, because the server has not closed the socket caused by the closed_wait occurred, resulting in the listener port opened the number of handles to 1024, and are in the state of Close_wait, the final configuration of the port is occupied full appearance " Too Many open files, no longer able to communicate.
The close_wait state occurs because the passive shutdown does not cause the socket to be closed, as shown in the attachment diagram:

Solution: There are two kinds of measures available
First, to resolve:
The reason is that because the accept () method of the ServerSocket class and the Read () method of the socket input stream cause a thread to block, the timeout should be set with the Setsotimeout () method (the default setting is 0, that is, the timeout never occurs) The timeout is cumulative, and once set, the blocking time caused by each call is deducted from the value until another timeout setting or a timeout exception is thrown.
For example, a service needs to call read () three times, timeout is set to 1 minutes, if the total time of three read () calls for a service exceeds 1 minutes, an exception is thrown, and if you want to repeat the service on the same socket, set a timeout before each service.
Second, evade:
Adjust system parameters, including handle parameters and TCP/IP parameters;

Attention:
/proc/sys/fs/file-max is the limit of the number of files that can be opened by the whole system, controlled by sysctl.conf;
Ulimit modifies the limit of the number of files that the current shell and its subprocess can open, controlled by limits.conf;
Lsof is a list of resources occupied by the system, but these resources do not necessarily occupy the open file number, such as: Shared memory, semaphores, Message Queuing, memory mapping, and so on, although these resources are occupied, but do not occupy the open file number;
Therefore, you need to adjust the current user's child process open the number of files, that is, the limits.conf file configuration;
If the Cat/proc/sys/fs/file-max value is 65536 or even larger, you do not need to modify the value;
If the value of the open files parameter is less than 4096 (the default is 1024), modify the open files parameter value to 8192 by using the following method: Ulimit-a
1. Use root login, modify file/etc/security/limits.conf
vi/etc/security/limits.conf add
Xxx-nofile 8192
XXX is a user, if you want to replace all users with *, set the value of the hardware configuration, do not set too large.
#<domain> <type> <item> <value>

* Soft Nofile 8192
* Hard Nofile 8192

#所有的用户每个进程可以使用8192个文件描述符.
2. Bringing these restrictions into effect
Determine that the file/etc/pam.d/login and/etc/pam.d/sshd include the following lines:
Session Required Pam_limits.so
The user can then log back in to take effect.
3. The ulimit-a can be used under bash to see if it has been modified:

First, the modification method: (temporarily effective, after restarting the server, will revert to the default value)
Sysctl-w net.ipv4.tcp_keepalive_time=600
Sysctl-w net.ipv4.tcp_keepalive_probes=2
Sysctl-w net.ipv4.tcp_keepalive_intvl=2

Note: Linux kernel parameter adjustment is reasonable to pay attention to observe, see how the business peak time effect.

If a modification is made, it will work, and the following modifications shall be made for permanent effect.
Vi/etc/sysctl.conf

If the following information does not exist in the configuration file, add:
Net.ipv4.tcp_keepalive_time = 1800
Net.ipv4.tcp_keepalive_probes = 3
NET.IPV4.TCP_KEEPALIVE_INTVL = 15

After editing the/etc/sysctl.conf, it will not take effect until the network is restarted.
/etc/rc.d/init.d/network restart
Then, execute the sysctl command to make the change take effect, basically even complete.

------------------------------------------------------------
Reason for modification:

When the client, for some reason, first issued a fin signal on the server, it will cause the server to be passively shut down, if the server does not actively close the socket fin to the client, at this point the service socket will be in the close_wait state (rather than the Last_ack state). Typically, a close_wait is maintained for at least 2 hours (the system default timeout is 7,200 seconds, which is 2 hours). If a server-side program causes a system to consume resources for a reason, it is usually not until the moment it is released that the system crashes. Therefore, the solution to this problem can also shorten this time by modifying the parameters of TCP/IP, and then modify the tcp_keepalive_* series parameters:
Tcp_keepalive_time:
/proc/sys/net/ipv4/tcp_keepalive_time
INTEGER, the default value is 7200 (2 hours)
The frequency at which TCP sends keepalive messages when KeepAlive is open. The recommended modification value is 1800 seconds.

Tcp_keepalive_probes:integer
/proc/sys/net/ipv4/tcp_keepalive_probes
INTEGER, the default value is 9
TCP sends a KeepAlive probe to determine how many times the connection has been disconnected. (Note: The connection is sent only if the so_keepalive socket option is opened.) The number of defaults does not need to be modified, of course, depending on the situation can also be appropriate to shorten this value. Set to 5 more appropriate)

Tcp_keepalive_intvl:integer
/proc/sys/net/ipv4/tcp_keepalive_intvl
INTEGER, default value is 75
The frequency at which probes are sent again when the probe is not confirmed. Probe how often a message is sent (how many TCP keepalive probe packets are sent before the connection is determined to fail). Multiply tcp_keepalive_probes to get the time to kill a connection that has not responded since the start of the probe. The default value is 75 seconds, which means that no active connections will be discarded after approximately 11 minutes. (For general applications, this value is somewhat large and can be changed as small as needed.) in particular, web-class servers need to change the value, 15 is a more appropriate value.

"Detection Method"
1. The system no longer appears "Too Many open files" error.

2. The sockets in the TIME_WAIT state will not be stimulated long.

Use the following statement on Linux to see the TCP status of the server (Connection status count):

Netstat-n | awk '/^tcp/{++s[$NF]} end {for (a in S) print A, s[a]} '

The results of the return sample are as follows:

Established 1423
Fin_wait1 1
Fin_wait2 262
Syn_sent 1
Time_wait 962

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.