Go to the conntrackfull issue of ip_conntrack again

Source: Internet
Author: User
Increasing nf_conntrack_max can alleviate this problem, or reduce the time when the conntrack table entry occupies the kernel memory. However, this remedy is always temporary. note: Do not excessively reduce the timeout of CT status of NEW and TCP establish as much as possible... increasing nf_conntrack_max can alleviate this problem, or reduce the time when the conntrack table entry occupies the kernel memory. However, this remedy is always temporary.
Note: Do not excessively reduce the timeout of CT status of NEW and TCP establish.

Try not to reduce the NEW status time, because for some bad networks, it takes a long time for a packet to go back and forth. for TCP, RTT has not been measured yet. If the retention time of the NEW state conntrack is too short, it will lead to a large number of NEW state connections. for many modules dependent on the ctstate, this will cause problems, for example, the iptables filter table uses the ESTABLISH status to release the returned packet of the forward packet, at this time, ip_conntrack is likely to treat the returned package as the NEW state rather than the ESTABLISH state due to the short state time. as a result, the returned package will fail. As shown in:

Using a simple experiment, we can easily confirm the figure above. taking a simple udp communication as an example, we compile a udp-echo program, and the string sent by the server's simple echo client:
For (;;)
{
N = recvfrom (sd, msg, MAXLINE, 0, pcliaddr, & len );
Sleep (5 );
Sendto (sd, msg, n, 0, pcliaddr, len );
}
 
Then, run echo $ sec/proc/sys/net/ipv4/netfilter/ip_conntrack_udp_timeout on the client.
The sec parameter is smaller than the sleep parameter on the server.
In this way, the udp client will not receive the strings returned by the server eho, because the client only allows inbound traffic in the establish state. if the ip_conntrack_udp_timeout configuration is too short, the NEW state conntrack will be released too early, in this way, there will be no traffic in the establish status. For UDP, because it does not confirm that no connection permits packet loss, the impact is not great, and TCP has similar problems, that is, if you connect to a TCP server that is far away and has bad network conditions, and then you set ip_conntrack_tcp_timeout_synsent to a very small value, the handshake will almost fail three times. Further, if you set ip_conntrack_tcp_timeout_established to an excessively small value, once the three-way handshake establishes a connection, the client and the server do not send packets for a long time. when the establish status expires, the conntrack is released, in this case, the server sends a package. what is the conntrack status of the package? Therefore, it is understandable that the tcp establish status is five days. It should be noted that, for tcp, because the server cannot easily control the latency of sending syn-ack, it is necessary to make a fuss in the establish state instead of the new state (in fact, the establish status of ip_conntrack is mapped to multiple tcp states, including syn-ack, ack, and established.
The previous question about ip_conntrack is too far away. our primary problem is the problem of conntrack full. In fact, if you think deeply about this conntrack full problem, you will find that it is not the full caused by the too small conntrack capacity or the long retention time of table items. In reality, everything is not infinite. for computer resources, it is more important to save usage and prevent irrelevant persons from wasting such resources. In addition, since the kernel defaults the survival time of a table item, it must be a tested experience. Therefore, the essential problem is that many conntrack packages are also conntrack, which will squeeze out a lot of traffic that really requires conntrack.
So what traffic needs conntrack? There are two common rules: one is any iptables rule that uses the ctstate or state match, and the other is the rules in all iptables nat tables, if we know in advance which traffic needs to be controlled using the [ct] state of iptables and also know which traffic needs to be NAT, then the remaining traffic will be unrelated to conntrack, it can be tracked by ip_conntrack.
Fortunately, Linux Netfilter inserts a table with a higher priority before the conntrack of the PREROUTING and OUTPUT hooks, that is, raw, it can be used to separate the traffic that does not need to be conntrack. If you are sure that you only need to perform NAT for the incoming traffic of a network card, execute the following rules:
 
Iptables-t raw-a prerouting! -I $ Nic-j NOTRACK
Iptables-t raw-a output-j NOTRACK
In this way, resources will not be wasted on irrelevant persons, and the performance will also be improved, because all NOTRACK traffic will not query the conntrack hash table, because in ip (nf) the internal Start of _ conntrack_in has a judgment:
 
If (* pskb)-> nfct)
Return NF_ACCEPT;
The implementation of NOTRACK's target is also very simple:
 
(* Pskb)-> nfct = & ip_conntrack_untracked.info [IP_CT_NEW];
In fact, a placeholder is set to the nfct of skb, which can maintain the consistency of other codes.
It can be seen that three methods are effective when necessary: 1. increase conntrack_max; 2. reduce the state storage time; 3. separate irrelevant traffic. However, in addition to the third method, the other two methods must be done for good reasons. for 1, you must understand the way the kernel memory is occupied. for 2, take a look at the first half of this article.
 
Iptables-a forward-m state -- state UNTRACKED-j ACCEPT
Finally, I have a question:
 
For TCP connections without keepalive, imagine that the server and client did not communicate with each other within five days after the establish status. one day after five days, the server sent a packet to the client, however, at this time, the conntrack status on the firewall/NAT device has expired and is deleted. at this time, the data packet will be considered as a NEW data packet and will be dropped, and the client will never receive this data packet, in this case, no ACK is sent, and the server continues to resend and be dropped by the firewall. when the number of resends reaches a certain number of times, the server resets the connection. However, how does the client know that, this deadlock can be broken only when the client actively sends packets. but who can ensure that the client will send packets? This is not a defect of Linux ip_conntrack. is it an extreme measure to design the establish status for five days? however, who can ensure continuous communication between the two ends within five days?
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.