Linux NetFilter Learning Notes 12 IP layer netfilter NAT module Code Analysis __linux

Source: Internet
Author: User


This section is mainly to analyze the NAT module-related hook function and target function, mainly to clarify the principle of NAT module implementation.

1.NAT-related hook function analysis

NAT module is mainly in the nf_ip_prerouting, nf_ip_postrouting, Nf_ip_local_out, nf_ip_local_in four nodes on the NAT operation, In the previous section we knew that there were only prerouting, postrouting, local_out three chains in the Nat table, and no nf_ip_local_in chain, so local_in operations at the Snat Hook Point could not be created.

While the NAT module registers the hook function at the local_in point, and the hook function also calls the general processing function of NAT conversion, do you want to do NAT conversion to the LOCAL_IN packet?

In fact, in the local_in registration hook function is not primarily for NAT conversion, because the system for a source IP for a forwarding packet snat, you may get a random value on the source port, if the source IP is a packet to send to the gateway, The energy port is the source port that Nat transforms just now, in order to guarantee the uniqueness of the tuple variable in the original direction of the connection tracking item, we need to change the source port value by calling the common processing function of NAT transformation at the hook point of the local_in. Retrieves a new, unique, tuple variable that is not being used. This should be why local_in also needs hook callback functions. 1.1 nf_nat_in

This function is a callback function that the NAT module registers on the pre_routing hook point, the function is to implement the Dnat function, the function is defined as follows, mainly implements the following two functions:

1. Call function Ip_nat_fn implement Dnat conversion

2. When the destination IP address of the converted packet is changed, it is necessary to call Dst_release, subtract the SKB reference from Dst_entry, and then place skb->dst to NULL

static unsigned int
nf_nat_in (unsigned int hooknum,
  struct Sk_buff **pskb,
  const struct Net_device,
  const struct Net_device *out,
  Int (*OKFN) (struct Sk_buff *))
{
unsigned int ret;
__be32 daddr = (*PSKB)->nh.iph->daddr;
 
ret = NF_NAT_FN (Hooknum, PSKB, in, out, OKFN);
if (ret!= nf_drop && ret!= nf_stolen &&
    daddr!= (*PSKB)->nh.iph->daddr) {
dst_release ( (*PSKB)->dst);
(*PSKB)->dst = NULL;
}
return ret;
}


This function is mainly by calling Nf_nat_fn, which is a universal NAT conversion function, which will focus on analyzing this function 1.2 nf_nat_out

This function is the hook callback function of the NAT module at the post_routing Hook Point, which implements the following functions:

1. Call function Ip_nat_fn implement SNAT conversion

static unsigned int
nf_nat_out (unsigned int hooknum,
   struct Sk_buff **pskb,
   const struct Net_device,
   const struct Net_device *out,
   Int (*OKFN) (struct Sk_buff *))
{
#ifdef config_xfrm
struct nf_conn *ct;
Enum Ip_conntrack_info Ctinfo;
#endif
unsigned int ret;
 
/* Root is playing with raw sockets. */
if ((*PSKB)->len < sizeof (struct IPHDR) | |
    (*PSKB)->NH.IPH->IHL * 4 < sizeof (struct IPHDR)) return
nf_accept;
 
ret = NF_NAT_FN (Hooknum, PSKB, in, out, OKFN);
#ifdef config_xfrm
if (ret!= nf_drop && ret!= nf_stolen &&
    (ct = nf_ct_get (*PSKB, &ctinfo) != NULL) {
enum Ip_conntrack_dir dir = Ctinfo2dir (ctinfo);
 
if (Ct->tuplehash[dir].tuple.src.u3.ip!=
    ct->tuplehash[!dir].tuple.dst.u3.ip
    | | ct->tuplehash [Dir].tuple.src.u.all!=
       ct->tuplehash[!dir].tuple.dst.u.all
    ) return
ip_xfrm_me_harder (PSKB) = = 0? Ret:nf_drop;
}
#endif return
ret;
}


This function is also called function NF_NAT_FN to implement SNAT conversion 1.3NF_NAT_LOCAL_FN

This function is the hook callback function of the NAT module at the output Hook point, which implements the following functions:

Function: Realize Dnat conversion function

1. Call function Ip_nat_fn implement Dnat conversion

2. Call Ip_route_me_harder, reroute the operation (unlike pre_routing, for output hook callback function, when the destination address changes, you need to call in the function Ip_route_me_harder to find the route again, In the pre_routing chain, the SKB->DST is set to NULL, and the packet is then searched for its own route as it goes down. Output chain received data are already routed packets, and subsequent call functions will not have to find the route operation, so to Nf_nat_out to implement routing lookup. )。

static unsigned int
nf_nat_local_fn (unsigned int hooknum,
struct Sk_buff **pskb,
const struct Net_device * In,
const struct Net_device *out,
Int (*OKFN) (struct Sk_buff *))
{
struct nf_conn *ct;
Enum Ip_conntrack_info Ctinfo;
unsigned int ret;
 
/* Root is playing with raw sockets. */
if ((*PSKB)->len < sizeof (struct IPHDR) | |
    (*PSKB)->NH.IPH->IHL * 4 < sizeof (struct IPHDR)) return
nf_accept;
 
ret = NF_NAT_FN (Hooknum, PSKB, in, out, OKFN);
if (ret!= nf_drop && ret!= nf_stolen &&
    (ct = nf_ct_get (*PSKB, &ctinfo))!= NULL) {
enum I P_conntrack_dir dir = Ctinfo2dir (ctinfo);
 
if (Ct->tuplehash[dir].tuple.dst.u3.ip!=
    ct->tuplehash[!dir].tuple.src.u3.ip) {
if (ip_route_me_ Harder (PSKB, Rtn_unspec))
ret = Nf_drop;
}
#ifdef CONFIG_XFRM
Else if (ct->tuplehash[dir].tuple.dst.u.all!=
 Ct->tuplehash[!dir]. Tuple.src.u.all)
if (Ip_xfrm_me_harder (PSKB))
ret = Nf_drop;
#endif
} return
ret;
}


This function is actually called function NF_NAT_FN to implement NAT conversion. 1.4 local_in Hook

The NAT module's hook callback function in Nf_local_in is a direct call to NF_NAT_FN, which requires attention to the following information:

For the nf_local_in chain, because there is no input chain in the NAT table, so for the nf_local_in point, it will not modify the IP address of the packet, that is, call alloc_null_binding to achieve NAT conversion, The biggest possibility is to modify the source port number of the packet to implement the reply Nf_conntrack_tuple variable of the data connection tracking item is unique and not used by other connection tracking items.

This is why it is necessary to register the hook callback function at the nf_local_in Hook point without registering the input chain in the NAT table.

1.5 General NAT conversion function

For the general NAT conversion function, the most important is function NF_NAT_FN, and the implementation of NF_NAT_FN involves a lot of functions, here we yiyi analysis. 1.5.1 Nf_nat_fn

The main function of the function is to implement the data NAT operation (including Snat and Dnat), specifically, for a data stream corresponding to the connection tracking item to perform only once snat, Dnat, and when the data flow corresponding to the connection tracking item NAT operation completed, for subsequent packets, The NAT conversion is performed directly based on the value of the nf_conntrac_tuple variable in the reply direction of the connection tracking item, and the data is then handed back to the protocol stack.

This function is analyzed below:

Function: Implement NAT function (including Snat/dnat function)

1. First of all, determine whether the packet meets the requirements (must not be segmented), the packet corresponding to the connection tracking items meet the conversion requirements, etc.

2. For the desired connection, the message should be NAT converted for ICMP message

3. Call Nf_nat_rule_find for connection tracking only for new state and not NAT-converted connection tracking items, and not nf_local_in hook points

Nat Conversion of items

4. After the above 3 operation, the Nf_nat_packet is invoked to perform a NAT conversion of the packet.

The NAT transformation of the connection tracking entry only occurs when the connection tracking item has just been created and has not been confirm, and each NAT type performs only one NAT conversion.

static unsigned int nf_nat_fn (unsigned int hooknum, struct sk_buff **pskb, const struct Net_device *in, const struct  Net_device *out, Int (*OKFN) (struct Sk_buff *)) {struct nf_conn *ct; enum ip_conntrack_info ctinfo; struct Nf_conn_nat
*nat;
struct Nf_nat_info *info; /* Maniptype = SRC for postrouting.
 
*////* Get NAT Type * * Enum Nf_nat_manip_type maniptype = Hook2manip (hooknum); /* We never fragments:conntrack defrags on pre-routing and Local-out, and Nf_nat_out protects. * * Nf_ct_assert (!) ( (*PSKB)->nh.iph->frag_off & Htons (ip_mf|
Ip_offset)));
/* Get the data packet corresponding to the connection tracking item/ct = Nf_ct_get (*PSKB, &ctinfo);  * Can ' t track?  It ' s not due to stress, or conntrack would have dropped it. Hence it ' s the user ' s responsibilty to packet filter it out, or implement Conntrack/nat for that protocol.
8--RR * * * When the packet does not have a connection tracking item, and for the Icmp_redirect, return drop;
Returns accept when the packet does not have a connection tracking item and is not icmp_redirect; */if (!CT) {/* exception:icmp redirect to New connectionIn hash table yet). We must not let this through, in case we ' re doing NAT to the same network.
 
*/if ((*PSKB)->nh.iph->protocol = = ipproto_icmp) {struct ICMPHDR _hdr, *hp;
HP = Skb_header_pointer (*PSKB, (*PSKB)->nh.iph->ihl*4, sizeof (_HDR), &AMP;_HDR);
if (HP!= NULL && hp->type = = icmp_redirect) return nf_drop;
return nf_accept;
*/if (ct = = &nf_conntrack_untracked) return nf_accept;
/* Returns accept*/nat = Nfct_nat (CT) When the connection tracking item does not have an associated nf_conn_nat variable;
if (!nat) return nf_accept;
/* for packets with the desired connection original and reply direction, Nat is performed for packets of ICMP protocol, and for connection tracking items in the new state of the expected connection and not expected connection, the NAT conversion is performed only if the NAT operation of the connection tracking item is not performed. (a) for local_in hooks, call alloc_null_binding for NAT operations and may modify the four-layer protocol-related keyword B) for connection tracking items that have been identified without NAT operations, invoke the Alloc_null_binding_
Confirmed for NAT operations, only possible changes to the four-tier protocol-related keywords. C For other cases, find out if there is a NAT rule in the Iptables NAT table that matches the data stream, and if so, call target for the NAT operation based on the NAT type (SNAT nf_nat_rule_find Target, Dnat target) */switch (ctinfo) {case Ip_ct_related:case ip_ct_related+ip_ct_is_reply:if ((*PSKB)->nh.iph-& Gt;protocol = = ipproto_icmp) {if (!nf_nat_icmp_reply_translation (CT, ctinfo, Hooknum, PSKB)) return nf_drop; else Retu
RN nf_accept; }/* Fall thru ...
 
(only icmps can be ip_ct_is_reply) */Case ip_ct_new:info = &nat->info;  /* Seen it before? This can happen to loopback, Retrans, or local packets.
 
*/if (!nf_nat_initialized (CT, maniptype)) {unsigned int ret; if (Unlikely (nf_ct_is_confirmed (CT))/* NAT module was loaded late/ret = alloc_null_binding_confirmed (CT, info, hoo
Knum);  else if (Hooknum = = nf_ip_local_in)/* local_in Hook doesn ' t have a chain!
* * ret = alloc_null_binding (CT, info, hooknum);
 
else ret = Nf_nat_rule_find (PSKB, Hooknum, in, Out, CT, info); if (ret!= nf_accept) {return ret;}} Else DEBUGP ("Already setup manip%s for CT%p\n", Maniptype = = ip_nat_manip_src?
"SRC": "DST", CT);
 
Break DefaulT:/* established * nf_ct_assert (Ctinfo = = Ip_ct_established | |
Ctinfo = = (ip_ct_established+ip_ct_is_reply));
info = &nat->info;
} nf_ct_assert (info);
/* Call Nf_nat_packet to implement the NAT operation of the packet based on the reply tuple variable of the connection tracking item/return Nf_nat_packet (CT, ctinfo, hooknum, PSKB);
  }


This function is mainly concerned with functions nf_nat_initialized, alloc_null_binding_confirmed, alloc_null_binding, Nf_nat_rule_find, Nf_nat_packet, Let's begin to analyze these functions.

1.5.1.1 nf_nat_initialized

This function is mainly to determine the transmission of the connection tracking items, there is no manip type of NAT conversion.

If the value of the Manip is ip_nat_manip_src, the status of the connection tracking item is judged.

Whether the ips_src_nat_done_bit bit is 1 or 1 indicates that the connection tracking item has been Snat converted and does not need to be converted again;

The judgment of Dnat is similar to that of the above Snat, by which you can avoid snat or dnat operations on a connection-tracking item multiple times.

static inline int nf_nat_initialized (struct nf_conn *ct,
     enum Nf_nat_manip_type manip)
{
if (Manip = = Ip_nat _MANIP_SRC) return
test_bit (Ips_src_nat_done_bit, &ct->status);
else return
test_bit (Ips_dst_nat_done_bit, &ct->status);
}


1.5.1.2 alloc_null_binding_confirmed

This is confirmed for the connection tracking item, but its NAT operation has not yet been performed.

According to our logic, the NAT operation for the connection tracking item is before the connection tracking item is created and the connection tracking item is confirmed.

How did that happen? The connection tracking entry has been confirmed, but the NAT transformation has not been done yet.

This occurs when the connection tracking module has been loaded and has been working for a period of time before the NAT module is loaded.

Since the NAT module is later loaded, it is also necessary to NAT convert the previously confirmed connection tracking entries.

So, for the previously confirmed connection tracking item, although it has been confirmed, but because the NAT module is not loaded so that it is not added to the linked list corresponding to the by_source[] array, the NAT module transforms the connection tracking item by converting the Nf_conntrack_ Tuple variables that are compared to all nf_conntrack_tuple variables on the connection tracking item to ensure the uniqueness of the converted connection tracking item. Based on this principle, if the previously confirmed connection tracking item is not converted through NAT and added to the by_source[] list, the tuple variable of the converted connection tracking item may appear to conflict with the previously confirmed connection tracking item. It is also necessary to NAT the previously confirmed connection trace entry, but the NAT transformation does not modify the IP address, the most likely is to fine-tune the source or destination port.

This function is relatively simple, tracking the type of hook, confirming the type of NAT conversion, and then setting the IP address in range to the corresponding IP address of the nf_conntrack_tuple in the reply direction (see Also, no modification of IP address). The function nf_nat_setup_info is then invoked to implement the NAT conversion operation on the connection tracking item.

unsigned int
alloc_null_binding_confirmed (struct nf_conn *ct, struct nf_nat_info *info
     ,
     unsigned int Hooknum)
{
__be32 ip
= (Hook2manip (hooknum) = = Ip_nat_manip_src
   ? ct->tuplehash[ip_ct_dir_reply] . Tuple.dst.u3.ip
   : CT->TUPLEHASH[IP_CT_DIR_REPLY].TUPLE.SRC.U3.IP);
u_int16_t All
= (Hook2manip (hooknum) = = Ip_nat_manip_src
   ? ct->tuplehash[ip_ct_dir_reply]. Tuple.dst.u.all
   : ct->tuplehash[ip_ct_dir_reply].tuple.src.u.all);
struct Nf_nat_range Range
= {ip_nat_range_map_ips, IP, IP, {all}, {all}};
 
DEBUGP ("Allocating NULL binding for confirmed%p (%u.%u.%u.%u) \ n",
       CT, Nipquad (IP));
return Nf_nat_setup_info (CT, &range, hooknum);


Function Nf_nat_setup_info is the most important function of the final NAT conversion, followed by alloc_null_binding, Masquerade_target, Ipt_snat_target, Ipt_dnat_ Target ultimately calls this function to implement the NAT transformation of the connection tracking item, and it is necessary to come up with this function separately.

1.5.1.3 alloc_null_binding

In function Nf_nat_fn, the function is called for NAT conversion only for local_in hooks, because the NAT table does not have a local_in chain, so it certainly does not match the NAT conversion rule in the local_in chain. But the kernel will never put a useless code there and never change it. For local_in Hook Point, although the data stream is not NAT, but there are other three tiers of IP the same data stream NAT, the need for the NAT data stream four-layer port to occupy the situation, This can lead to conflicting data connection tracking items. To solve this problem, you need to call Nf_nat_setup_info to find a unique tuple variable for the current no NAT data stream (the value of the new unique tuple variable is two: the original tuple variable is the only Modify the original tuple variable's four-layer protocol-related keywords and get a new unique tuple variable. and adds the connection tracking entry to the by_source[] corresponding list, so that when other data connection tracking items are converted, the converted Nf_conntrac_tuple variables are compared to the values in the confirmation list of the connection tracking items. Nat conversion is done without conflict.

The execution flow of this function is similar to alloc_null_binding_confirmed, and ultimately it is the conversion of the call Nf_nat_setup_info to the connection tracking entry.

inline unsigned int
alloc_null_binding (struct nf_conn *ct,
   struct nf_nat_info *info,
   unsigned int hooknum)
{
/* Force range to this IP; let Proto decide mapping for
   per-proto parts (hence not ip_nat_range_proto_speci fied).
   Use reply in case it's already been mangled (eg local packet).
* *
__be32 IP
= (Hook2manip (hooknum) = = Ip_nat_manip_src
   ? ct->tuplehash[ip_ct_dir_reply]. Tuple.dst.u3.ip
   : CT->TUPLEHASH[IP_CT_DIR_REPLY].TUPLE.SRC.U3.IP);
struct Nf_nat_range Range
= {ip_nat_range_map_ips, IP, IP, {0}, {0}};
 
DEBUGP ("Allocating NULL binding for%p (%u.%u.%u.%u) \ n",
       CT, Nipquad (IP));
return Nf_nat_setup_info (CT, &range, hooknum);


1.5.1.4 Nf_nat_rule_find

When the connection tracking item is not above 1.5.1.2, 1.5.1.3 these two types, and is new, RELATED, related+reply

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.