How do I understand the connection tracking mechanism in netfilter?
I'm going to start with a question, because in the path of knowledge exploration there is only so much to ask and then to fully mobilize the machine of thought to make myself go farther. Connection tracking definitions are simple: used to record and track the status of a connection.
Q: Why do I need a connection tracking function?
A: Because it is the basis for the implementation of stateful firewalls and NAT .
OK, sort of understood. In order to implement the stateful firewall function and NAT address translation function based on data connection state detection, Neftiler has issued a connection tracing mechanism. That means: If the connection tracking option is turned on when the kernel is compiled, then the Linux system maintains a connection state for each packet it receives to record the state of the data connection. Next, we will study the design idea and implementation of NetFilter's connection tracking.
Before we had a diagram, we could clearly see that the hook function used to implement the connection tracking entry was registered to the Netfitler nf_ip_pre_routing and nf_ip_local_out two hook points, respectively, at a higher priority. The hook function used to implement the connection trace exit is registered to the NetFilter nf_ip_local_in and nf_ip_post_routing two hook points, respectively, with very low priority.
In fact, pre_routing and local_out points can be regarded as the entrance of the whole netfilter, and post_routing and local_in can be regarded as their export. In cases where only connection tracking is considered, there are three types of data packages that can be left without a single packet:
One, the packet sent to the machine
Process: pre_routing----local_in---local process
Ii. packets that require native forwarding
Process: pre_routing---FORWARD---post_routing---out
Third, the packet sent from the machine
Process: local_out----post_routing---out
We all know that in the inet layer is used to represent the structure of the packet is the famous sk_buff{} (hereafter referred to as SKB), if you unfortunately have not heard of this thing, then I strongly suggest you first fill in the basic knowledge of the network protocol stack before continuing to read this article. In SKB, there is a member pointer NFCT, which is a struct nf_conntrack{}, which is defined in the Include/linux/skbuff.h file. This structure records the count of the connection records being publicly applied, and also facilitates references to connection traces elsewhere. Connection tracking is generally used in real-world applications by forcing type conversions to convert NFCT exponentially to ip_conntrack{} type (defined in include/linux/netfilter_ipv4/ip_conntrack.h) To obtain status information for a connection trace that a packet belongs to. That is: theneftilter framework uses ip_conntrack{} to record the state relationship of a packet to its connection .
A very useful interface is also available in the Include/linux/netfilter_ipv4/ip_conntrack.h file: struct Ip_conntrack *ip_conntrack_get( SKB, Ctinfo) is used to obtain a SKB NFCT pointer that informs about the connection status of the packet and information about the state of the connection ctinfo. From the connection tracking point of view, this ctinfo represents several connection states for each packet:
L ip_ct_established
Packet is a part of a built-in connection in its initial direction.
L ip_ct_related
The packet belongs to an established connected connection in its initial direction.
L Ip_ct_new
Packet trying to establish a new connection
L ip_ct_established+ip_ct_is_reply
Packet is a part of a built-in connection in its response direction.
L ip_ct_related+ip_ct_is_reply
The packet belongs to a connected connection that has been built in its response direction.
Within the connection trace, each SKB received is first converted to a ip_conntrack_tuple{} structure, which means that the ip_conntrack_tuple{} structure is the "known" Packet of the connection tracking system. So how does the SKB and ip_conntrack_tuple{} structure convert? There is no unified answer to this question, which is closely related to the specific agreement. For example, for the TCP/UDP protocol, according to "source, destination ip+ source, destination port" plus serial number can uniquely identify a packet, for the ICMP protocol, according to "source, destination ip+ type + code" plus serial number can only determine an ICMP message and so on. The situation is more complicated for an "activity" protocol such as an application layer like FTP. This article does not attempt to analyze the connection tracking implementation of a specific protocol, but explores the design principle and workflow of the connection tracking, so that we master the essence of the connection tracking. Because now the Linux kernel is updated too fast to 3.4.x, the change is big AH. Even 2.6.22 and 2.6.21 in the connection tracking this piece is still some difference. Once we understand the design idea of the connection tracking, mastered its verve, it again how to original aim, and then see the specific code implementation will not make a confused. As the saying goes, "give people a fish, rather than give people a fishing," We teach you is the method. With the method coupled with their own qinxuekulian, that becomes the skill, finally can make everyone in their own agreement to develop the connection tracking function in mind. This is also my intention and purpose of writing this series of blog posts. With June mutual encouragement.
Before we begin to analyze connection tracking, we still stand in the direction of the commander to look over the layout of the entire connection trace. Here I first with a relatively rough streamlined flowchart for everyone to do a show, the purpose is to facilitate understanding, good entry. Of course, my understanding may also have the less accurate place, also invites the Daniel to help the younger brother to correct.
Let me reiterate: connect to track sub-entrances and exits two points.
Remember: Create a connection tracking record at the entrance and add the record to the connection tracking table when exporting。 Let's take a look at each other.
Entrance:
The flow of the entire portal is summarized as follows: For each incoming SKB, the connection trace converts it into a tuple structure and then uses that tuple to check the connection tracking table. If a packet of that type is not tracked, a connection record entry is established for the hash table of the connection trace, which is not used for packets that have been tracked. Immediately thereafter, call the packet () callback function provided by the connection tracking module of the protocol to which the message belongs, and finally change the state of the connection tracking record according to the state.
Export:
The entire export process is summarized as follows: For each packet that is about to leave the NetFilter framework, if the connection tracking module that is used to process the protocol type message provides a helper function, then the packet is first processed by the helper function before it is judged, if the message has been tracked, The state of its owning connection determines whether the packet is discarded, returned to the stack for transmission, or added to the connection tracking table.
Protocol Management for connection tracking:
As we have said earlier, the implementation of connection tracking for different protocols is not the same. For each protocol to develop its own connection tracking module, it must first instantiate a variable of type ip_conntrack_protocol{} struct, make the necessary padding for it, and then call Ip_conntrack_protocol_ The register () function registers the structure by setting it to the appropriate position in the global array ip_ct_protos[] According to the protocol type.
The Ip_ct_protos variable holds all protocols that the connection tracking system can currently process, and the protocol number is the only subscript for the array, as shown in.
struct ip_conntrack_protocol{} Each member, the kernel source code has made a very detailed comments, here I do not explain, in the actual development process we used the function of the specific analysis.
Auxiliary modules for connection tracking:
NetFilter's connection tracking provides us with a very useful function module: helper. This module allows us to extend the connection tracking function at a very small cost. The requirement for this scenario is that when a packet is about to leave the NetFilter framework, we can do some final processing on the packet. As we can see from the previous diagram, the helper module is registered with a lower priority on the NetFilter local_out and post_routing two hook points.
Each auxiliary module is an object of the ip_conntrack_helper{} struct type. That is, if you develop a protocol that requires a connection tracking helper to do some work, then you must also instantiate a ip_conntrack_helper{} object, populate it, and finally call ip_conntrack_helper_register{ } function registers your secondary module in the global variable helpers, which is a doubly linked list that holds the auxiliary modules that are currently registered to all protocols in the connection tracking system.
The definition and initialization of the global helpers variable is done in the Net/netfilter/nf_conntrack_helper.c file.
Finally, the doubly linked list represented by our helpers variable is generally the same as shown:
From this we can basically know, registered in the NetFilter frame local_out and post_routing two hook points on ip_conntrack_help () What the callback function does is basically clear: that is, by traversing the helpers list sequentially, and then invoking the Help () function of each ip_conntrack_helper{} object.
Desired connection:
NetFilter's connection tracking provides a mechanism called "expected connections" to support "active" connections such as FTP. We all know that the FTP protocol service port with 21 port to do the command transmission channel, active mode server with 20 port for data transmission channel; In passive mode, the server randomly opens a port higher than 1024, and then the client connects to the port to begin data transfer. In other words, both primary and passive require two connections: the connection of the command channel and the connection of the data channel. Connection tracking presents an "expected connection" concept when dealing with this scenario, where a data connection is related to another data connection and then gives its own solution for this "affinity" connection. As we have said, this article does not intend to analyze the implementation of a specific protocol connection tracking. Next we'll talk about the desired connection.
Each desired connection is represented by an object of the ip_conntrack_expect{} struct type, and all expected connections are stored in a doubly linked list that is directed by the global variable ip_conntrack_expect_list. The structure of the list is generally as follows:
struct ip_conntrack_expect{} members and their meaning in the kernel source code also made a full comment, here I do not introduce each, wait until the time needed to discuss in detail.
Connection Tracking Table:
Said for a long time finally to our connection tracking table in the limelight. The Connection tracking table is a hash list used to record all packet connection information, in fact, the connection tracking table is a data packet hash value of a two-way circular linked list array, each node in each linked list is ip_conntrack_tuple_hash{} an object of type. The Connection tracking table is represented by a global, doubly linked list pointer variable, ip_conntrack_hash[]. To make it easier for us to understand ip_conntrack_hash[] This array of two-way loops linked lists, we will refer to several important structures that are not currently introduced ip_conntrack_tuple{},ip_conntrack{ } and ip_conntrack_tuple_hash{} are described separately.
We can see that ip_conntrack_tuple_hash{} is just a encapsulation of ip_conntrack_tuple{} and organizes it into a doubly linked list structure. Therefore, we can think of them as the same thing at an understanding level.
In analyzing the structure of ip_conntrack{} , we have listed all the previous and related data structures so that we can understand and remember them easily.
The graph is the data core of the Link Tracking section, so let's talk about the meaning of the relevant members in the ip_conntrack{} structure.
L Ct_general: This structure records the number of connection records that are publicly applied, and also facilitates references to connection traces elsewhere.
L Status: The state of the packet connection, which is a bit bitmap.
L Timeout: Each connection to a different protocol has a default time-out, and if the connection tracking record is refreshed when a packet that is not part of a connection is exceeded, the time-out function provided by this protocol type is called.
L Counters: This member can only be opened when the kernel is compiled and CONFIG_IP_NF_CT_ACCT will be present, representing the number of bytes and packets recorded for a connection.
L Master: This member points to another ip_conntrack{}. Typically used for the desired connection scenario. That is, if the current connection is the desired connection of a different connection, then the member points to the primary connection to which we belong.
L Helper: If a protocol provides an extension module, the function function of the extension module is invoked by that member.
L Proto: This structure is the ip_conntrack_proto{} type, and the ip_conntrack_protocol{} structure we have described earlier for storing different protocol connection traces should not be confused. The former is an enumeration type, and the latter is a struct type. The proto here represents some of the additional parameter information required by different protocols in order to implement their own connection tracking capabilities. Currently this enumeration type is as follows:
If your protocol requires additional data to implement connection tracking in the future, you can augment the structure.
L Help: This member represents some of the additional parameter information required by different applications in order to implement their own connection tracking functionality, as well as the ip_conntrack_help{} structure of an enumeration type, and the struct type we just introduced earlier ip_conntrack_helpers{} Easy to confuse. ip_conntrack_proto{} is a requirement for the protocol layer, and ip_conntrack_help{} is present for the application layer needs.
L Tuplehash: The structure is an array of type ip_conntrack_tuple_hash{}, with a size of 2. Tuplehash[0] Represents the connection condition in the "initial" direction of a data flow, and tuplehash[1] represents the response of the data flow "answer" direction, as shown in.
So far, we have known the concept of connection tracking design and its working mechanism: connection tracking is a set of basic framework provided by NetFilter, and different protocols can develop the connection tracking function of this protocol according to the particularity of its own protocol, and the connection tracing mechanism is developed. Finally, it is handed over to the connection tracking mechanism for unified management.
This article focuses on the analysis of the data packet in the connection tracking system journey, in order to achieve a deep understanding of the principle of connection tracking operation.
Connection tracking mechanism in the NetFilter framework registered hook function altogether five: Ip_conntrack_defrag (), ip_conntrack_in (), ip_conntrack_local (), Ip_conntrack_ Help ()
and Ip_confirm (). In the last few posts we know that ip_conntrack_local () has finally called ip_conntrack_in (). These five hook functions and their mount points, presumably now everyone should have been familiar with the heart, if not remember to see the "up" post.
At the entrance of the connection trace there are mainly three functions at work: Ip_conntrack_defrag (), ip_conntrack_in (), ip_conntrack_local (); at exit two: Ip_conntrack_help () and Ip_confirm ().
The next thing becomes very fascinating, please treat yourself as a packet that needs to be forwarded and a new connection. Then follow me to connect with the track to play a lap. Before entering connection tracking, I need to warn you that connection tracking does not change the packet itself, but it may discard packets.
The route map for our trip has been:
Ip_conntrack_defrag ()
When we first arrived at the link to track the door, it was this little born to entertain us. This function is mainly to complete the reassembly of the IP message shards, and the multiple shards belonging to an IP message are re-assembled into a real message. On the IP shard, you can read the "TCP/IP Detailed Volume 1" to understand a bit of the foundation, as to how IP shards are reassembled a complete IP message is not our center of gravity, here does not expand speaking. The function also revealed to us a secret, that is, the connection tracking only trace the complete IP packet, not IP shard tracking, all IP shards must be restored to the original message, in order to enter the connection tracking system.
Ip_conntrack_in ()
The core of the function is what the RESOLVE_NORMAL_CT () function does, and its execution flow is as follows:
In the next analysis, you will need a few data structures mentioned in the previous article:
ip_conntrack{}, ip_conntrack_tuple{}, ip_conntrack_tuple_hash{}, and ip_conntrack_protocol{} and their relationships must be made clear, You can thoroughly understand what the RESOLVE_NORMAL_CT () function is. It is best to have a copy of the 2.6.21 kernel source, and then open the source insight to compare the reading effect will be better!
The first step: the Ip_conntrack_in () function first finds the Connection tracking processing module registered by a protocol (such as TCP,UDP or ICMP) in the global array ip_ct_protos[] based on the protocol number of the packet SKB Ip_conntrack_ protocol{}, as shown below.
In a struct, a specific protocol must provide a callback function pkt_to_tuple () and invert_tuple () that converts its own packet skb into a ip_conntrack_tuple{} structure, and the new () function to handle the newly connected connection, and so on.
Second step: After locating the processing unit proto of the corresponding protocol, the error check function provided by the Protocol (if provided by the Protocol) is called to verify the legality of the SKB.
Step three: Call the RESOLVE_NORMAL_CT () function. The importance of the function is self-evident, and it bears all the work remaining at the entrance of the connection tracking. The function generates a ip_conntrack_tuple{} struct object tuple based on the relevant information in SKB, invoking the Pkt_to_tuple () function provided by the protocol. Then use the tuple to find the connection tracking table to see if it belongs to a tuple_hash{} chain. Please note that
A connection trace consists of two lines
ip_conntrack_tuple_hash{}
chain composition, one "go" one "back", see the section at the end of the previous blog post. To make it more intuitive for everyone to understand the connection tracking table, I'll draw it, like, an array of doubly linked lists.
If the Tuple_hash linked list that the tuple belongs to is found, the address of the linked list is returned, if it is not found, indicating that the type of packet is not tracked, then we must first establish an instance of the ip_conntrack{} structure, that is, create a connection record entry.
Then, after calculating the reply repl_tuple of the tuple, and making the necessary initialization for the ip_conntrack{} object, it also includes assigning the address of our computed tuple and its inverse tuple to the connection trace Ip_ Conntrack in tuplehash[ip_ct_dir_original] and tuplehash[ip_ct_dir_reply].
Finally, return the address of ip_conntrack->tuplehash[ip_ct_dir_original]. This is exactly the address of a linked trace record initial direction list. NetFilter has a linked list unconfirmed, which holds all the connection tracking records that have not yet received a confirmation message, and then our IP_CONNTRACK->TUPLEHASH[IP_CT_DIR_ ORIGINAL] will be added to the unconfirmed linked list.
Fourth Step: Call the packet () function provided by the protocol, which assumes the final mission of returning a value to the NetFilter framework, returns 1 if the packet is not a valid part of the connection, or returns NF_ACCEPT. That is, if you are developing a connection tracking feature for your own protocol, you must carefully design the packet () function in the struct when instantiating a ip_conntrack_protocol{} object.
Although I do not line up the code, only analysis principle, but there is a code to mention it.
The RESOLVE_NORMAL_CT () function has a line of ct = tuplehash_to_ctrack (h) code, see source code. Where h is an existing or newly established ip_conntrack_tuple_hash{} object, the CT is a pointer to the ip_conntrack{} type. Do not mistakenly assume that this code is creating a CT object because the work created in the Init_conntrack () function has been completed. The line of code means that the structure of the Tuplehash[ip_ct_dir_original] member in the ip_conntrack{} structure is calculated in the same way as the address of its members IP_ conntrack{} The first address of the object , please note.
You also see that the ip_conntrack_in () function simply creates a ip_conntrack{} object to hold the connection tracking record, and completes the fill and state setting of its related properties. To put it simply, we have now got the "green card" ip_conntrack{for the link tracking system, but it has not yet been stamped into effect.
Ip_conntrack_help ()
All you have to do is to remember the diagram in front of me about the hook function mounted on the five hook points and understand the position of the ip_conntrack_help () function at the hook point where it was registered. The next thing is simple when the protocol that this packet belongs to has provided the ip_conntrack_helper{} module when it provides the connection tracking module, or if someone else has extended function modules for a packet of our type of protocol:
First, determine whether the packet received a "green card", that is, whether the connection tracking is the type of protocol Baoseng into a connection tracking record entry ip_conntrack{};
Second, the connection state to which the packet belongs does not belong to the associated connection of an established connection in its response direction.
Two conditions are established, the helper module provided by the Help () function to deal with our packet skb. Finally, the Help () function must also return nf_accept or Nf_drop equivalents to the NetFilter framework. If any of the conditions are not true, the Ip_conntrack_help () function returns directly to Nf_accept, and we continue to transmit the packet.
Ip_confirm ()
This function is the last guy we met when we left NetFilter, and if we've got the "Green Card" ip_conntrack{}, and we have not received a confirmation message from the connection that this packet belongs to, and the connection has not been invalidated. And then, what we're going to do with this ip_confirm () function is:
Get the connection trace to generate a ip_conntrack{} object for the packet, calculate its hash value according to the connection "to", "go" to the tuple, and then find out if the tuple already exists in the connection trace table ip_conntrack_hash[] see. If it already exists, the function returns Nf_drop, and if it does not, inserts the connection "Come", "go" direction tuple into the connection trace table ip_conntrack_hash[] and returns nf_accept to the NetFilter frame. The only reason to add the connection tracking record to the connection tracking table is to consider that the packet might be filtered out.
At this point, we have finished the trip successfully. Here we only analyze the situation of the forwarded message. The message flow to this machine is consistent with this, and for all messages sent from this machine, the only difference in the flow is the change of the ip_conntrack_local () function to the place where ip_conntrack_in () is called. As I said earlier, ip_conntrack_local () is actually called ip_conntrack_in (). Ip_conntrack_local () only adds a feature: It is not a connection trace for small packets sent from the local computer.
Analysis of initialization process of connection tracking system
With previous knowledge, it is too easy to analyze the initialization of the ip_conntrack_standalone_init () function for the connection tracking system. Or first the flowchart of the Ip_conntrack_standalone_init () function:
The core of the function has been labeled "Initialize Connection Tracking System" and "hook function for registering connection tracking". The other two pieces here to do a simple popularization, do not start speaking. At least let everyone understand why connection tracking requires two file systems.
1. PROCFS (/proc file system)
This is a virtual file system that is typically mounted on/proc, allowing the kernel to output internal information to user space in the form of a file. All files in this directory are not physically present on the disk, but can be written through cat, more, or >shell redirects, which can even specify their read and write permissions as normal files. The kernel components that create these files can describe who can read or write any one file. However: users cannot Add files or directories under the/proc directory.
2, Sysctl (/proc/sys directory)
This interface allows user space to read or modify the value of a kernel variable. Each kernel variable cannot be manipulated with this interface: the kernel should explicitly indicate which variables are visible from this interface to the user space. From the user space, you can access the variables of the sysctl output in two ways: sysctl system call interface; Procfs. When the kernel supports the Procfs file system, a special directory (/proc/sys) is added to the/proc, and a file is introduced for each kernel variable output by sysctl, and we can influence the value of the variable in the kernel by reading and writing the file.
There is also a Sysfs file system, which is not covered here, if you are interested in reading "Linux device Driver" a detailed explanation of the book.
Then back to our connection tracking system, so we can know: The connection tracking system to the user space output some kernel variables, user-friendly to the connection tracking some characteristics of flexible control, such as changing the maximum connection tracking number, modify TCP, UDP or ICMP protocol connection tracking timeout is the time limit and so on.
Note: Any file name in the/proc/sys directory corresponds to a kernel variable with the same name in the kernel. For example, the directory in my system looks like this:
Ip_conntrack_init () function
The function of the connection tracking system initialization of most of the work, its flow we also draw out, we can control the source code to step-by-step analysis.
The first step: the table size of the connection trace is related to system memory, while the maximum number of connection traces and the capacity of the connection trace table are: Maximum number of connection traces =8x the connection tracking table capacity. This is true in the code:
Ip_conntrack_max = 8xip_conntrack_htable_size; then from the above figure we can see that we can manually modify the/proc/sys/net/ipv4/netfilter directory under the same name Ip_ The Conntrack_max file dynamically modifies the maximum number of connection traces connected to the system.
The second step: register NetFilter to use the sockopt, do not speak, later. Just know that it is registered here on the line.
Step three: Allocate memory for the connection tracking hash table ip_conntrack_hash and initialize it. and create a cache of connection traces and expected connection traces.
Fourth step: The TCP, UDP and ICMP Protocol Connection tracking Protocol body, according to the protocol number of different protocols, registered to the global array ip_ct_protos[], as follows:
Finally, we will do some cleanup work, such as registering the function function required for the drop target, and initializing the parameter ip_conntrack_untracked for other modules such as NAT, which we will discuss in detail in the NAT module.
In this way, the initialization of our connection tracking system is done completely. With the previous several on the basic knowledge of connection tracking, and then see the code is a kind of refreshing, enlightened feeling.
As for the registration of the five hook functions provided by the connection tracking system, I think you should know what it is doing without even looking at it now.
Netfilter&iptables: How to understand the connection tracking mechanism?