Linux Kernel netfilter: Role of ip_conntrack module-Abstract Summary

Source: Internet
Author: User

The Nat rule converts the address information of the data stream. After the translation, you need to write the converted address information to the ip_conntrack structure. After the translation, the destination address is in only two directions, one is the local machine (redirect target), and the other is other machines (nat on the gateway is usually like this), so netfilter needs to support the conversion records in both directions.
Netfilter's ip_conntrack provides the following two hooks. ip_conntrack_out_ops is used for Nat to other machines. ip_conntrack_local_in_ops provides Nat to the local machine:
Static struct nf_hook_ops ip_conntrack_out_ops = {
. Hook = ip_refrag, // This name is a bit weird, but it is also reasonable, because ip_conntrack may need to operate the protocol header or data load above Layer 4, therefore, you must perform defrag operations on IP data packets on the mount point prerouting.
. Owner = this_module,
. PF = pf_inet,
. Hooknum = nf_ip_post_routing,
. Priority = nf_ip_pri_last,
};
Static struct nf_hook_ops ip_conntrack_local_in_ops = {
. Hook = ip_confirm, // This name is good, but it does not cover all its functions
. Owner = this_module,
. PF = pf_inet,
. Hooknum = nf_ip_local_in,
. Priority = NF_IP_PRI_LAST-1,
};
Whether it is ip_refrag or ip_confirm, the _ ip_conntrack_confirm function is called eventually. The logic of the function _ ip_conntrack_confirm shows the structure of ip_conntrack at a Glance:
Int _ ip_conntrack_confirm (struct nf_ct_info * nfct)
{
...
Hash = hash_conntrack (& CT-> tuplehash [ip_ct_dir_original]. tuple );
Repl_hash = hash_conntrack (& CT-> tuplehash [ip_ct_dir_reply]. tuple );
...
If (! List_find (... ip_ct_dir_original...) // if the original stream and the stream information after Nat (you can also do not use NAT, the two tuple are the same at this time) are not added to the hash
&&! List_find (... ip_ct_dir_reply ...)){
List_prepend (& ip_conntrack_hash [hash],
& CT-> tuplehash [ip_ct_dir_original]);
List_prepend (& ip_conntrack_hash [repl_hash],
& CT-> tuplehash [ip_ct_dir_reply]);
CT-> timeout. expires + = jiffies;
...
Set_bit (ips_confirmed_bit, & CT-> status); // The stream passes through the Local Machine smoothly
...
}
...
}
Finally, ip_conntrack is used as an introduction to describe the role of ip_conntrack in the entire netfilter and Its Implementation logic in a simple way:
Struct ip_conntrack
{
...
Struct ip_conntrack_tuple_hash tuplehash [ip_ct_dir_max]; // two directions in total
...
Struct list_head sibling_list; // related connection
...
Struct ip_conntrack_expect * master; // opposite to sibling_list
...
Struct {
Struct ip_nat_info Info;
Union ip_conntrack_nat_help help;
} Nat;
};
Repl means replace. Yes, because orig will not be modified, although repl is literally reply, it is actually replace, whether it is snat or DNAT, the modified information is the link information of the data packet to the destination. For SNAT, the data stream enters the gateway and is then SNAT. At this time, the tuple of orig will not change, but the change is the tuple of reply, after the data is returned, the destination address is the converted source address, and the source address is the destination address of the current data stream. Therefore, REPL is the same as reply and replace.
The fact is that tuple is only used to match the stream. When a packet enters, tuple is used to associate the packet with a stream, ip_conntrack_tuple contains enough information to match a stream, including the IP address and port number:
Struct ip_conntrack_tuple
{
Struct ip_conntrack_manip SRC; // similar to the following DST
Struct {
U_int32_t IP; // ip address
Union {
... // Application layer protocol
} U;
U_int16_t protonum; // Transport Layer Protocol
} DST;
}; // This structure contains two IP-Port Pairs
The following uses an abstract method to simplify the process of ip_contrack and Nat, and analyzes the behavior of related modules in the kernel when data packets flow through the NAT gateway from the context perspective, data streams are expressed as (x-y), and port information is ignored, and status information is ignored. In fact, the status information is very important. Other modules of Netfilter can use the status set by ip_contrack to make special decisions, at the same time, the status information can also identify the current behavior of a stream and the behavior to be given at present. Ignoring this is to make things easier and lead to a method for analyzing code, any omission in the event of a daily event may result in additional efforts and mood.
When a packet (a-B) enters R, SNAT occurs, and the address information becomes (m-B), although Nat occurs, (a-B) AND (m-B) it should belong to the same data stream CK, and ip_conntrack needs to be recorded to bind the two streams. The data from A to B is in the r direction from m to B, in one direction, the data is from the source to the target. After an SNAT occurs, the data can go out. Since the data goes out of R, we don't care about it anymore, we are concerned about how to bind the response data from B to the stream CK after arriving at R. After the data is returned, the SNAT Stream ID is obviously (B-m ), therefore, ip_conntrack needs to bind (B-m) to the CK. Now we can define the CK:
Struct contrack {
Two-direct [2];
} Ck = {
Two-direct [0] = (a-B );
Two-direct [1] = (B-M );
};
# Define ip_ct_dir_original 0
# Define ip_ct_dir_reply 1
Compared with ip_conntrack above, what is missing? Since the above example shows that SNAT occurs at R, the NAT guidance information should also be included in CK. After that, it will be complete. Nat information is actually an array in two directions, such as the following rules:
-A postrouting-D 172.16.0.0/255.255.0.0-O eth0-J Masquerade
In fact, there are two Nat rules: one is the address of the data source that arrives at the 172.16.0.0 network to the eth0 address, and the other is the address of the data packet that is returned from 172.16.0.0, otherwise there will be no data back, therefore, for a source address (-S) or Destination Address (-d), a total of NAT rules (two directions * maximum mount point of a stream) are required, the next stream can have a maximum of several mount points. If the data is just a pass, the maximum is two, namely, SNAT and DNAT, there are likely to be three mount points. Therefore, a NAT rule requires three pairs, that is, Six, which are represented by an array as nat_info [6], so CK becomes:
Ck = {
Two-direct [0] = (a-B );
Two-direct [1] = (B-M );
Nat-info [] = {,,,,,};
};
What are the final elements in Nat-info? It must be the address information. The simplest way is that each element is an IP-port pair, and then index the NAT type through the array subscript, such as the definition:
# Define src_nat 0
# Define opposite_src 1
# Define dst_nat 2
# Define opposite_dst 3
...
With the above definition, the NAT information array is initialized:
Nat-info [] ={{ src_ip_to, port1}, {src_ip_from, port2 },
{Dst_ip_to, port3}, {dst_ip_from, port4 },
...}; // Corresponds to the definition of CK. src_ip_to is m, and src_ip_from is.
Src_ip_to is the new source IP address to be converted, and src_ip_from is the original source IP address to be converted back in response to data. The IP address with the DST prefix has a similar meaning. After the table is initialized, when the corresponding data stream has another data packet, you can directly query this table to perform address translation. Because the back-and-forth streams are mapped to the CK struct, no matter which direction the data comes from, (a-B) or (B-m, it corresponds to the same CK, which is implemented by hash in Linux. Now that the CK is found, you can retrieve the nat-info from the CK to obtain the address conversion information, obviously, it is impossible to check the NAT table every time a packet arrives, but the nat-info is determined when the first packet of a stream arrives, that is, when a stream (CK) is created, after a ck is created, find the NAT table. If a rule hits the table, create the nat-Info information according to the rules of the NAT table, at the same time, two-direct [1] must be updated to a new converted stream (which must be reversed ), when the first data packet of the stream flows out of the machine or to the user layer, the above stream information is recorded in the hash. Both directions must be recorded. If there are more packets coming, no matter which direction, the request packet or response packet can be found by querying the address information hash. Then, when Nat arrives, you can directly retrieve the nat-info from CK, determine the current location after the removal According to the current hhok to use the information in Nat-info to implement address translation. Throughout the process, CK is used to track connections, and its biggest share is to create a CK when the first package of a stream is used. Then, Nat will use this initial ck to query the NAT table, then, the data packet CK and its information, such as Nat information, can be directly obtained and used. The conntrack module is used for subsequent use.
Don't be blinded by the complex details in the kernel code. In fact, every piece of code has a very simple idea, just as the idea of ip_conntrack-nat and the design of the data structure is the same as what I described above, if you can abstract the data structure into the simplest form and analyze the ideas by reading the code, then reading the code will be rewarding, otherwise, one day you will be lost in the sea of characters and cannot extricate yourself. In the end, you can only see that the trees are not in the forest, and you do not want to study them any more. After understanding the general process, you need to read the code again. The following are several important functions:
Ip_nat_setup_info: Initialize ip_nat_info information;
Find_best_ips_proto_fast: The New tuple after Nat initialization;
Ip_conntrack_alter_reply: configure the reverse tuple changed due to Nat;
Do_bindings: Nat module implements Nat translation.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.