About dual-connection server load balancer

Source: Internet
Author: User
Tags ftp site
Article Title: about dual-connection server load balancer. Linux is a technology channel of the IT lab in China. Includes basic categories such as desktop applications, Linux system management, kernel research, embedded systems, and open source.
Save everyone's time. First, let's talk about the theme:
  
Packet-level TCP/UDP load balancing and NAT (Network Address Translate) cannot coexist.
  
What does this mean? To put it simply, if you have a dynamic IP connection, a LAN using a private IP Address and accessing the Internet through NAT (Network Address Translate), we feel that the speed is not enough, I want to add another one to double the speed of an Internet host (a typical situation in mainland China now). sorry, you are dead. this is impossible.
  
  
The connections mentioned here include Ethernet, PPPoE, PPPoA, PPP, and SLIP. In addition, this is caused by the internal mechanism of the TCP/UDP protocol and is not related to the software/hardware. Applications with any architecture over TCP/UDP cannot bypass this restriction, regardless of hardware or software.
  
Note: This is not the case if your Lan uses a public IP instead of NAT.
  
If you want to know the reason or question my conclusion :), continue reading.
  
This article assumes that
  
1. you have read Linux Advanced Routing & Traffic Control HOWTO, especially the Routing for multiple uplinks/providers in Section 4.2, Section 4.2.1 Split access and section 4.2.2 Load balancing. This article (hereinafter referred to as lartc) clearly describes the connection-level TCP/UDP load balancing.
  
  
  
________
  
+ ------------ +/
  
|
  
+ --- NATed --- + Provider 1 + -------
  
_ IP1 | 100.0.0.1 |/remote host 1
  
___/\__ + ------ + ------- + ------------ + |
  
_/\ _ | If1 |/
  
/Local host 1 \ |
  
| Local network ----- + Linux router | Internet
  
\ _ Local host 2/|
  
\__ _/| If2 |
\ ___/+ ------ + ------- + ------------ + |
  
IP2 | 150.0.0.1 | \ remote host 2
  
+ --- NATed --- + Provider 2 + -------
  
|
  
+ ------------ + \________
  
  
  
This figure is taken from lartc. the official site is http://lartc.org /. For ease of writing, I would like to describe it here and make minor changes.
2. for ease of understanding, the following uses ADSL PPPoE dial-up Internet access in common Linux as an example. However, as mentioned above, it applies to other situations. A number pair (IP address, port) is used to represent a link endpoint. Assume that
  
  
  
Host IP Port
  
  
Local host 1 192.168.0.1 5000
  
Linux route if1 100.0.0.1 6000
  
Linux route if2 150.0.0.1 7000
  
Remote host 1 200.0.0.1 80
  
So what is the difference between the connection-level and package-level load balancing?
  
1. link-level server load balancer
  
The Linux 2.4 kernel supports connection-level load balancing. To understand this problem, we will first describe the static routing mechanism of Linux. When a Linux host receives an IP packet that needs to be forwarded, it first checks whether there is related routing information in the Cache: If yes, it will be used directly; otherwise, it searches for the route table. The routing information that is not used in the Cache for a period of time is discarded. If a MultiPath route is set, which route is selected is random, but the subsequent IP packet will follow this route until the route information in the Cache is discarded.
  
For example, when host local host 1 on the LAN initiates a connection to remote host 1, the Linux router cannot find the route information about remote host 1 in the Cache, so it forwards the route table to the query table, when it finds that there is a MultiPath route, it will randomly select a route, for example, if1, and add this route to the Cache; when the next IP packet from local hsot 1 to remote host 1 arrives, the Linux router directly finds the route information in the Cache and forwards it directly. that is to say, all IP packets destined for remote host 1 are forwarded through if1 until the relevant routing information in the Cache is invalid and discarded without passing through if2, even if if2 is completely idle.
  
Therefore, communication with remote host 1 does not benefit from the second connection attached. The actual effect is that the timeliness of using Realplay to watch online movies has not improved, unless of course the psychological function :)
  
If local host 1 initiates a connection to remote host 2 at the same time, if2 may be used this time. The actual effect is that when you use Realplay to watch online movies, you can use flashget to open 8 concurrent threads to download a 600 M Redhat ISO file to an FTP site without affecting each other.
  
So what will happen when localhost 2 initiates a connection to remote host 2 while downloading the ISO file? Is it simply using if1? Or something else? Please think about it.
  
2. packet-level load balancing
  
In reality, there are many situations where you are busy while doing nothing. I will not go into details here :). So how can we improve it? Naturally, we can send the IP packet in the same TCP/IP connection from two upstream connections at the same time, and open the bow left and right. isn't that enough? In fact, we did this in PPP Multilink, but now there is no server support and we have to rely on ourselves. This is the Internet, so someone has thought of this and compiled the kernel patch for download. Please search for equalize_2.4.18.patch.
  
Please note that the patch date is Fri Mar 22 2002, rather than Thu Mar 21 2002. In addition, based on my experience, this patch can also be used in the 2.4.19 kernel.
  
After re-compiling the kernel, you can use the equalize keyword. If your LAN uses a public IP address, congratulations. now you can use Realplay to watch online movies at double speed. However, it is often said that life is not satisfactory. if you are using a private IP address and using NAT to access the internet, let's take a look at what happens when a connection is initiated between local host 1 and remote host 1.
  
Local host 1 initiates a connection from (192.168.0.1, 5000) to remote host 1 (200.0.0.1, 80) for the first time. now the first IP packet (SYN packet) has arrived at the Linux router. Since this is the first time, the Cache certainly does not contain route information, the Linux router looks for the route table and finds the MultiPath route with the equalize mark. Therefore, it is good to randomly decide how to forward this IP packet, this time, I am destined to go from if2 :) and add a route entry to remote host 1 in the Cache. since the IP address of local host 1 is private, NAT works, sent after the conversion source address is (150.0.0.1, 7000). No problem. remote host 1 receives the SYN + ACK packet and sends it back to (150.0.0.1, 7000) to accept the link. Linux router if2 receives the returned SYN + ACK package and has been prepared. the destination address of NAT translation is (192.168.0.1, 5000 ).
  
First round. So far everything is OK.
  
Next, the local host 1 (192.168.0.1, 5000) sends an ACK packet to (200.0.0.1, 80) according to the rule to confirm. This ACK package is at the crossroads of Linux router again. Where will it go? Since equalize is set, Linux does not determine the path based on the records in the previously generated route table Cache (which is generated by the SYN packet). On the contrary, it deletes the record from the Cache, search for the route table again. This leads to re-selecting the route, which may be if1 or if2.
  
If it is if2 again, it is good. remote host 1 receives the message, the handshake is complete, and the connection is established.
  
What if if1 is used? Unfortunately, if1 converts the package source address to (100.0.0.1, 6000) through NAT, then drops the package on the Internet, and safely reaches remote host 1. Remote host 1 is waiting for the ACK package from if2 (that is, 150.0.0.1, 7000), but does not expect to receive an ACK package from (100.0.0.1, 6000, what is its plan? It's an ACK package. Will it be used immediately? Obviously, this is not a good idea. Second, even if so, how does remote host 1 determine which SYN package is associated with this ACK package? Therefore,
  
The only reasonable measure is to ignore it.
  
After an unknown ACK packet is discarded, remote host 1 continues to wait... the result can be imagined: Unable to synchronize, connection creation failed. Here, the retransmission mechanism of TCP/IP cannot solve the problem. remote host 1 does not regard it as a normal TCP packet, although it is a normal IP packet.
  
As mentioned earlier, if if2 is used, we are very lucky that the connection can be established. Similarly, in the future, remote host 1 will receive all data packets, whether from if1 or if2, but all TCP packets from if1 will be handled by example and all will be discarded. Fortunately, local host 1 will re-transmit these discarded packets, and there will always be a chance to get them to if2, so the link can be barely maintained, but it is slow. of course, in some extreme circumstances, it may also be interrupted midway through.
  
Therefore, if equalize server load balancer and NAT work together, the connection may fail or be slow. In other words, 1 + 1 <1.
  
The last good news is that the ICMP protocol can work in this case. maybe you can try to use it to do something. Of course, this also means that a hacker using a dynamic IP address has twice the available bandwidth for your ICMP attack :)
  
Note. 4. the specific location of the MultiPath in the kernel is Networking options-> TCP/IP networking-> IP: advanced router-> IP: equal cost multipath ,. corresponding behavior in the config file
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.