Linux ip_conntrack is confusing

Source: Internet
Author: User

Tip of the last question "go deep into the conntrack full problem of ip_conntrack again"
Ip_conntrack has an event mechanism to notify ip_conntrack events, including events such as information expiration and deletion. Who should I notify? Of course, the notification is sent to all interested modules. One of them is the user-state process, so that the user-state process can take some measures, such as setting some rules on the firewall, this notification mechanism uses the observer design mode.
 
Linux ip_conntrack details
Linux ip_conntrack has a large number of States, and each State has a certain timeout time. Some of these states can be mapped to different States of the network protocol, others cannot. If the Protocol itself is stateful, it is easy to establish a ing relationship. Otherwise, if the protocol is not stateful, The ing relationship cannot be established. Sometimes, for stateless protocols, the status timeout of ip_conntrack may cause some depressing problems.
 
In short, if Linux's ip_conntrack mechanism goes deep into it, it will be hard to find out if it is used for firewall development. Here are a few examples.
 
Example
 
Example 1:
For UDP, It is stateless, requires no connection, and no confirmation. It is purely a Datagram Protocol. Therefore, ip_conntrack uses experience values to set the timeout time for each status, however, if the two sides do not send packets for a period of time, when the initial receiving end initiates another packet, it will affect the ctstate-based filtering rules on the firewall, for details, see "go deep into the conntrack full problem of ip_conntrack again".
 
Example 2:
For UDP, if NAT is used on the Linux firewall, even if the NAT rule is deleted or modified during data communication, this data stream still uses old NAT rules instead of no rules or new rules.
 
Example 3:
In the early kernel, load the ip_conntrack module and ping An address that can be pinged, but the tracing information of the connection is not displayed in/proc/net/ip_conntrack, ping An unreachable address, but you can see an UNREPLY tracing information in the opposite direction. It is worth noting that, at least on the 2.6.32 kernel, this problem no longer exists, but still exists on the 2.6.9 kernel. The specific version corrected it and did not look at the ChangeLog of the kernel in detail.
 
Example 4:
For TCP, as long as a connection is disconnected, the tracing information about the connection in/proc/net/ip_conntrack will be deleted immediately and will not be retained as in UDP.
 
Some explanations for the above problems
Example 1:
There is nothing to say. The root cause is that UDP itself is stateless, while ip_conntrack imposes the establish status on a UDP connection, the establish status of ip_conntrack indicates that there is a stream for all protocols, especially UDP. For TCP, ip_conntrack maps all traffic not in the syn state to the establisd state (note that it is not in the TCP established State), which also complies with the above definition. At the end of the ip_conntrack processing entry:
If (set_reply)
Set_bit (IPS_SEEN_REPLY_BIT, & ct-> status );
If (set_reply)
Set_bit (IPS_SEEN_REPLY_BIT, & ct-> status); this indicates that a status bit of ct will be modified as long as set_reply is true, and set_reply is set in the resolve_normal_ct call of ip_conntrack_in.
// As long as the reverse packet is received, IP_CT_ESTABLISHED will be set and set_reply will be set to 1, and then the IPS_SEEN_REPLY_BIT of ct-> status will be set when ip_conntrack_in is returned.
If (DIRECTION (h) = IP_CT_DIR_REPLY ){
* Ctinfo = IP_CT_ESTABLISHED + IP_CT_IS_REPLY;
/* Please set reply bit if this packet OK */
* Set_reply = 1;
} Else {
/* Once we 've had two way comms, always ESTABLISHED .*/
// If IPS_SEEN_REPLY_BIT is set, IP_CT_ESTABLISHED
If (test_bit (IPS_SEEN_REPLY_BIT, & h-> ctrack-> status )){
DEBUGP ("ip_conntrack_in: normal packet for % p \ n", h-> ctrack );
* Ctinfo = IP_CT_ESTABLISHED;
...
// As long as the reverse packet is received, IP_CT_ESTABLISHED will be set and set_reply will be set to 1, and then the IPS_SEEN_REPLY_BIT of ct-> status will be set when ip_conntrack_in is returned.
If (DIRECTION (h) = IP_CT_DIR_REPLY ){
* Ctinfo = IP_CT_ESTABLISHED + IP_CT_IS_REPLY;
/* Please set reply bit if this packet OK */
* Set_reply = 1;
} Else {
/* Once we 've had two way comms, always ESTABLISHED .*/
// If IPS_SEEN_REPLY_BIT is set, IP_CT_ESTABLISHED
If (test_bit (IPS_SEEN_REPLY_BIT, & h-> ctrack-> status )){
DEBUGP ("ip_conntrack_in: normal packet for % p \ n", h-> ctrack );
* Ctinfo = IP_CT_ESTABLISHED;
So we can see that the IP_CT_ESTABLISHED status has nothing to do with the specific Protocol. For TCP, all packets after SYN will be in the IP_CT_ESTABLISHED status. However, TCP itself can monitor the connection status, such as close-wait, so it has some sub-states in ip_conntrack, which is used to release the ip_conntrack data structure when appropriate, therefore, as long as the time-wait sub-state of TCP ip_conntrack expires, its ip_conntrack data structure will be immediately released, because TCP maps its Protocol Status to the sub-state of ip_conntrack, these sub-States know when a tcp stream is finished. However, UDP and ICMP are not so lucky. They do not have sub-states, and they can only use bold ip_conntrack states.
 
Example 2:
The NAT implemented by iptables/Netfilter in Linux is stateful NAT. It queries the NAT table only for the first packet of a stream and sets the query result to the ip_conntrack data structure of the stream, this result is used by all subsequent data packets in the header package of a stream. In addition, if UDP is not in the status, ip_conntrack cannot be released unless the status of establish expires. (In fact, it does not know when the UDP stream will end. Even the expiration time of establish is an experience value) ip_conntrack data structure. Since this data structure is not released, the NAT results saved in the header package remain valid, so this problem occurs. For ICMP, different kernel versions are different, which is the case in Example 3.
 
Example 3:
On Kernel 2.6.9, icmp_packet is as follows:
Static int icmp_packet (struct ip_conntrack * ct,
Const struct sk_buff * skb,
Enum ip_conntrack_info ctinfo)
{
/* Try to delete connection immediately after all replies:
Won't actually vanish as we still have skb, and del_timer
Means this will only run once even if count hits zero twice
(Theoretically possible with SMP )*/
// If a packet is returned, the reference count of the icmp stream is decreased. If the value is 0, the ip_conntrack connection is released. In fact, in the case of non-SMP, the reference count of ip_conntrack is always added to 1 in resolve_normal_ct. If the following statement is reached. the value of count is 0, and timeout is called. function, and ip_conntrack will not be released. In this case, you can still use the match ctsate or state in the filter table, ip_conntrack_put will not be called until the associated skb is free, and the reference count of Connection Tracing will be changed to 0 and deleted.
If (CTINFO2DIR (ctinfo) = IP_CT_DIR_REPLY ){
// If there is an icmp packet from the same source to the same destination or its return packet is incremented by the icmp. count field.
If (atomic_dec_and_test (& ct-> proto. icmp. count)
& Del_timer (& ct-> timeout ))
Ct-> timeout. function (unsigned long) ct );
} Else {
Atomic_inc (& ct-> proto. icmp. count );
Ip_ct_refresh_acct (ct, ctinfo, skb, ip_ct_icmp_timeout );
}
 
Return NF_ACCEPT;
}
Static int icmp_packet (struct ip_conntrack * ct,
Const struct sk_buff * skb,
Enum ip_conntrack_info ctinfo)
{
/* Try to delete connection immediately after all replies:
Won't actually vanish as we still have skb, and del_timer
Means this will only run once even if count hits zero twice
(Theoretically possible with SMP )*/
// If a packet is returned, the reference count of the icmp stream is decreased. If the value is 0, the ip_conntrack connection is released. In fact, in the case of non-SMP, the reference count of ip_conntrack is always added to 1 in resolve_normal_ct. If the following statement is reached. the value of count is 0, and timeout is called. function, and ip_conntrack will not be released. In this case, you can still use the match ctsate or state in the filter table, ip_conntrack_put will not be called until the associated skb is free, and the reference count of Connection Tracing will be changed to 0 and deleted.
If (CTINFO2DIR (ctinfo) = IP_CT_DIR_REPLY ){
// If there is an icmp packet from the same source to the same destination or its return packet is incremented by the icmp. count field.
If (atomic_dec_and_test (& ct-> proto. icmp. count)
& Del_timer (& ct-> timeout ))
Ct-> timeout. function (unsigned long) ct );
} Else {
Atomic_inc (& ct-> proto. icmp. count );
Ip_ct_refresh_acct (ct, ctinfo, skb, ip_ct_icmp_timeout );
}
 
Return NF_ACCEPT;
} This function is called back by ip_conntrack_in. In fact, each Protocol has a callback function similar to this, called packet, which processes protocol-related content. For example, TCP is used to process substates.
After reading the code analysis above, we learned that for addresses that can be pinged, we can clear the ip_conntrack data structure because the returned packet quickly arrived, however, because the reference count is not 0 after decreasing, it is not released, so it does not affect the -- state judgment in the filter table. However, once the related skb leaves the kernel, it will release the skb, then, the ip_conntrack reference count after decreasing 1 is released, and the Connection Tracing data structure is released immediately. This is the practice of the earlier version (including 2.6.9) kernel. Therefore, when ping is enabled, the Connection Tracing information is quickly released, and it is hard for you to see it in/proc/net/ip_conntrack. When you ping an inaccessible address, because no package is returned, the Connection Tracing information can be displayed, although the returned package status is NOREPLEY.
Now let's think about why the Linux kernel does not treat ICMP like UDP. After all, they are all of the same type. They are directly driven over the IP address, without connection, confirmation, or status. Why is it different? The reason is that UDP also represents a two-way or one-way communication, while ICMP only conveys a message. Generally, if no ICMP occupies a communication stream for a long time, it is generally a back-to-back or a back-to-no-back operation. This is the essential difference between them. Therefore, for ICMP, It is a back-to-back operation, the Connection Tracing information is deleted, which is reasonable and does not seem to be good. However, if you look at the processing of the persistent ping, ip_conntrack must constantly delete the Connection Tracing, then create a New Connection Tracing... such repetition consumes a lot of CPU resources. It seems that ip_conntrack is conducive to static space optimization for ICMP, that is, it minimizes the memory space usage, but does it make sense to maximize the CPU usage and the memory is so cheap today? In any case, the higher version changes this point in the opposite direction. The kernel of the higher version proposes another optimization, that is, the optimization of CPU time, in the end, it is treated in the same way as UDP, and an ICMP timeout value is set.
At least in kernel 2.6.32, ip_conntrack does not process ICMP as in Kernel 2.6.9, But UDP.
 
Example 4:
Through the example 1, I think this is not hard to understand.
 
To sum up
Linux's ip_conntrack mechanism is the cornerstone of many State-based configuration policies. Its status is based on an empirical timeout time. After the time expires, the corresponding connection tracing information will be deleted. This is the so-called "status-based" of Linux ". However, the status of the network protocol may not always match the status of ip_conntrack. Therefore, ip_conntrack has to define its own meaning.
In essence, ip_conntrack should deal with two types of protocols, one is a connectionless protocol and the other is a connectionless protocol. For connection protocols, since the Protocol itself knows when to remove a connection, ip_conntrack knows when to delete an ip_conntrack data structure. However, for connectionless protocols, the protocol itself does not know when to end a stream, so ip_conntrack can only be estimated based on experience values. Note that ip_conntrack is not monitored throughout the process. Therefore, even for a TCP connection protocol, it remains in a specific State, ip_conntrack still cannot know when it will exit this state. Therefore, for stateful protocols such as TCP, within a specific protocol state (ip_conntrack substate, its behavior is the same as that of stateless UDP protocols. For example, although the TCP establish is set to 5 days long enough, data is not transmitted for more than 5 days, there will also be situations where data is actively transmitted in the opposite direction.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.