How does one smoothly implement Nat in Linux?

Source: Internet
Author: User
1. Linux implementation Nat overview and Problems

Linux Nat is based on ip_conntrack. The Nat rule set by iptables is only valid for the first packet of a stream. Of course, except for rawnat implemented by xtables-Addons! Even rawnat, it must set two rules. It is really designed that there is an option to customize NAT behavior, rather than relying on the capabilities of the configurator. Otherwise, it may lead to a big disaster. In the process of product development and implementation, I have encountered this kind of situation. In the same user scenario, some people can hold it completely, while others are completely at a loss. This is just like a Chinese teacher who provides you with a dictionary, A good Chinese teacher teaches you to make sentences in the same way. Classical articles use the same text as rough words! In my opinion, a good design should be difficult for people to make mistakes, just as if they were right or not, and at least they would not be wrong!
I will not talk about the Linux implementation overview of NAT (you don't have to know the details, the overview can be), you can look at the source code, if you do not know this, don't look down. Now imagine a scenario:
1) Linux box does not have any NAT configuration at first;
2). Access a server from a Linux box or later machine. The source address is an intranet IP address that cannot be routed through the public network;
3). Data is sent normally, but do not expect to return anything (except for useless ICMP !);
4) in Linux box, ip_conntrack records a conntrack record in syn_sent state, expecting to return the record, and giving an X expiration time;
5) The client continuously reconnects because it has not received a response, and keeps touch the fragile syn-sent State conntrack record of the Linux box, so that it will never expire;
6). Even if the iptables Nat rule is set, since the conntrack does not expire, the first packet of the new stream will never be matched;
7) the above process is deadlocked and data access will never be available!

2. Problem Analysis and good design

Taking a look at the above process, it is actually very simple. As long as one of the two sides makes a little difference, it will solve the problem. For example, after the client tries n re-connections and fails, it will return to the system, in this case, you can always touch the conntrack of the linunx box to reach the maximum expiration time to unlock it, or, in turn, the Linux box conntrack finds that the total time that has been in an initial State exceeds a threshold value, the release can also be solved immediately. However, this is not a good design because it depends on the mutual understanding of the behavior of the other party and requires the interaction between the two parties! This idea has achieved great success in Ethernet and CSMA/CD, CSMA/CA of WiFi, but it cannot be used for Layer 7 data communication, because the Layer 7 is too messy!
So what should we do? I think a good design should be a smooth algorithm! Linux is actually a lazy when implementing Nat, that is: if your first package does not match the NAT rule, I will automatically create a NAT rule for you in the next package, nat is equivalent to not Nat. to unify the code, create a null Nat for you! Without in-depth analysis of the theory, such forced behavior itself is unreasonable. I do not want to use NAT, but do not have the rules I want, that is, I cannot match any rules! However, according to the logic of the application, even though there is no way to get through, I still want to try again, wait for the rule that I can match, before it comes, I will keep trying, maybe I may give up on my own, but that is also my freedom!
Therefore, the logic of my new version of NAT is that as long as the data packet does not reach the state of establish, that is, it is new, there will always be a NAT opportunity, but if it receives the returned packet, that is, it changes to the establish status, then it doesn't matter (this is also a lazy I stole, you need to know, what is the taste of a large group of women at home ).
3. Modify the Linux kernel code

Maybe it is not very appropriate. In fact, I don't need to modify any kernel. I only modified the netfilter code. In the past, netfilter was merged into the kernel because it was too powerful. A total of two files have been modified, not many, but very refined! I never shy away from writing code in a large segment, but I think it is a last resort. If there is a ready-made one, use it directly, it is also a kind of ability to achieve the maximum effect of having the least amount of code modifications. In my opinion, writing code in a large segment or sorting out a mathematical formula in a large segment is not a grade at all, I agree with the latter. Code is just an idea implementation. You must first have an idea and then implement it. As long as you have an idea and find someone has implemented it for you, you can use it directly!
1). Modify the nf_nat_rule_find function of $ K/NET/IPv4/Netfilter/nf_nat_rule.c:

Int struct (struct sk_buff * SKB, unsigned int hooknum, const struct net_device * In, const struct net_device * Out, struct nf_conn * CT) {struct net * Net = nf_ct_net (CT ); int ret; ret = ipt_do_table (SKB, hooknum, in, out, net-> ipv4.nat _ TABLE); If (ret = nf_accept) {If (! Nf_nat_initialized (CT, hook2manip (hooknum) {/* NUL mapping */ret = alloc_null_binding (CT, hooknum); // Linux is a lazy user and I am not lazy! I don't regard alloc_null_binding // as the successful Nat, because it is just a small trick, to avoid common null pointers! Therefore, if I clear the done bit, it indicates that I may continue to try nat later (only testing SNAT !) Clear_bit (ips_src_nat_done_bit, & CT-> Status) ;}} return ret ;}

2) modify the nf_nat_fn function of $ K/NET/IPv4/Netfilter/nf_nat_standalone.c.

Static unsigned intnf_nat_fn (unsigned int hooknum, struct sk_buff * SKB, const struct net_device * In, const struct net_device * Out, INT (* okfn) (struct sk_buff *)) {...... nat = nfct_nat (CT); If (! Nat) {/* Nat module was loaded late. * // the original implementation is: as long as the NAT module is loaded after the confirm, it will not work! If (/* set a switch to enable */0 & nf_ct_is_confirmed (CT) in smooth transition mode) return nf_accept; Nat = nf_ct_ext_add (CT, nf_ct_ext_nat, gfp_atomic ); if (NAT = NULL) {pr_debug ("failed to add Nat extension \ n"); Return nf_accept ;}} switch (ctinfo) {Case ip_ct_related: Case ip_ct_related + ip_ct_is_reply: if (ip_hdr (SKB)-> protocol = ipproto_icmp) {If (! Nf_nat_icmp_reply_translation (CT, ctinfo, hooknum, SKB) return nf_drop; else return nf_accept;}/* fall thru... (Only icmps can be ip_ct_is_reply) * // as long as no returned package arrives, it will always be new case ip_ct_new:/* seen it before? This can happen for loopback, retrans, or local packets... */If (! Nf_nat_initialized (CT, maniptype) {unsigned int ret; If (hooknum = nf_inet_local_in)/* local_in hook doesn't have a chain! */Ret = alloc_null_binding (CT, hooknum); else ret = nf_nat_rule_find (SKB, hooknum, in, out, CT); If (Ret! = Nf_accept) return ret; // The following newly added key points: // if it is a packet that has completely passed through this box and has never been successfully Nat by iptables rules, // continue to match the iptables Nat rule, because a new // iptables rule may be added during packet retransmission. If (nf_ct_is_confirmed (CT) {struct net * Net = nf_ct_net (CT); // if a new rule is matched, the position of the tuple in the chian is updated. Hlist_nulls_del_rcu (& CT-> tuplehash [ip_ct_dir_original]. hnnode); hlist_nulls_del_rcu (& CT-> tuplehash [ip_ct_dir_reply]. hnnode); // If the Del and reinsert operations are not performed, the returned package cannot be converted into the original package! Nf_conntrack_hash_insert (CT) ;}// the above is not optimized !! The optimization point is: The tuple's chain location will be updated only when the non-alloc_null_binding call is successful //; otherwise, it will not be useless!} Else pf_debug ("already setup manip % s for CT % P \ n", maniptype = ip_nat_manip_src? "Src": "DST", CT );......}

This is the case! However, my test procedure is relatively simple and I may not be able to think about it!
4. Test Logic

1). Socket
2). reuseaddr
3). Bind secondary IP Address
4); While true; do connect server; if success; then break; FI; done
5). Read and Write

In step 1, enter the iptables rule to convert the secondary IP address to the primary IP address. The result is successful.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.