LVS load balancing-nonsense concept

Source: Internet
Author: User

LVS (Linux Virtual Server) is a software system for load balancing, consisting of the Ipvs framework that actually works in the kernel section and the ipvsadm used to write rules in user space. The way to work is to distribute access requests to the back-end production servers through the set of scheduling methods and rules. Dr. Zhangwensong is the founder and main developer of LVS.

Load balancing is divided into network layer load balancing and application layer load balancing, LVS is still belong to the network layer. However, there should be application layer functionality in the future.

Time must be synchronized in the cluster.


First, the concept part:

Scheduling algorithm

LVs How to distribute access requests, look at the definition of the scheduling algorithm.

The first is the static dispatch, which is simply distributed, not the load of the backend server.

    1. Call (Round Robin): Use RR when defining. The simplest is to be distributed backwards sequentially in the order of the servers defined.

    2. weighted round call (Weighted Round Robin): Use WRR when defining. A weighted order of distribution, such as a weight of 2, gives the server 2 access requests, and then sends the access request to the step-down server.


    3. Source Address hash (source Hashing) : Sh. A hash table is recorded with the source address of the access and the address of the server being distributed, and LVS sends the request to the corresponding server when the source address is accessed after the connection is disconnected, because it affects the equalization effect very rarely.


    4. Destination Address hash (Destination Hashing): DH. I don't know much about this, just like the source Hashi, it becomes the target address.


Dynamic scheduling, depending on the number of connections between different servers, dynamic adjustment.

  1. Minimum Link (Least Connections): LC. As the name implies, always assign requests to the least-linked servers. The calculation is overhead=active*256+inactive (active connection *256+ inactive connection). But the problem is that the performance of each server is different how to do.


  2. weighted Minimum link (Weighted Least Connections): WLC. The default scheduling algorithm. Overhead= (active*256+inactive)/weight. This makes up for different performance issues, but with 0 connections, the results are the same 0, which is allocated in the order of the servers that are defined. It does not seem to be an important problem, it is also used in this algorithm (the algorithm is also expensive, too advanced algorithm, the cost is also large). However, there are algorithms to solve this problem.


  3. Shortest expected delay (shortest expected Dela y): sed. In order to avoid the WLC algorithm in the case that the calculation is 0 overhead, the resulting allocation of a very poor server. So the algorithm here is overhead= (active+1) *256/weight, plus 1 will not be 0, it will not produce the same value calculated. But there is a problem, if the weight of a server is 300, one is 50, so in the case of a few requests for access, it can be 300 of the station has been working, 50 of the station has been idle. Although it is not a big problem, there are still algorithms to solve this problem.


  4. minimum queue scheduling (never queue scheduling NQ): NQ. As long as the number of connections to the server is 0, the algorithm puts the access request to the server first, only the number of connections to all servers is not 0, and then by the SED algorithm to dispatch.


  5. Minimal links based on locality (locality-based Least Connections)

  6. Local least-link with replication (locality-based Least Connections with Replication)

Some of the above may be introduced is not very detailed, there are 5, 6 do not know how to introduce: you can see the following two URLs, introduced very please CHU.

http://zh.linuxvirtualserver.org/node/2903

Http://www.linuxvirtualserver.org/zh/lvs1.html

Note: As long as the TCP connection is not interrupted, it will be dispatched to the same server, LVS maintains a connection table to maintain those who are establishing a TCP connection or have established a connection is the same server. The dispatch to a different server is a new connection request.


While the time-outs of those connections in the table can be set, some of the data listed below are not completely fixed, and may be changed secretly because of some other mechanism, even if it shows 15 minutes, it may not be in a minute.


By default, the SYN status time-out is 1 minutes, the timeout for the established state is 15 minutes, the fin state has a timeout of 1 minutes, and the UDP state has a timeout of 5 minutes. When the connection terminates or times out, the scheduler removes the connection from the table


Type of LVS:

    • NAT Model : LVS implemented with the Dnat function.

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/6E/BB/wKioL1WEG4Swf0JJAADc9HqTBuk193.jpg "title=" LVS. JPG "alt=" Wkiol1weg4swf0jjaadc9hqtbuk193.jpg "/>

The VIP in the figure represents the virtual IP, which is the external IP. The director is the LVS Scheduler, and the dip is the director's IP. Real server is directly to the customer to provide services to the server, RIP is the real server IP, and CIP is the client's IP.

    • Director is the only external host, the internal host is transparent to the outside, improve security.

    • Likewise, all data goes through the director, which is a bottleneck under high load conditions.

    • Support Port Mappings

    • The internal hosts use private IPs because only DIRECOTR one host is external.

    • Dip and RIP are in the same network.

    • The internal host has to be a director for the gateway, so that external data can be sent to the director.

    • CIP sends a request to Vip,director to change the destination address and port of the received access request to the IP and port of a real server and record it in the NAT table, and then from the DIP network card to the inside, when the source address is CIP and the destination address is a real server, A real server receives access to its own access request and replies, at this time the destination address is CIP, the source address is real server Rip,director received real server data, look up the NAT table, and then change the source address to VIP, and from the VIP network card issued.

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/6E/BE/wKiom1WEGPqgIn_zAAFerSl95Lw255.jpg "title=" NAT. JPG "alt=" Wkiom1wegpqgin_zaafersl95lw255.jpg "/>



    • Dr Model :

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/6E/BB/wKioL1WEHHOzDQ1uAAEb_539YG4526.jpg "title=" Dr. JPG "alt=" Wkiol1wehhozdq1uaaeb_539yg4526.jpg "/>


    • The above picture is a bit messy: the approximate meaning is: LVS and each real server have a VIP address. The access request comes from the customer to the LVS VIP, and returns to the VIP directly from the real server.

    • When the data comes through the director, must not go through the DIRECTOR,DR model is to avoid this.

    • Port mappings are not supported.

    • The gateway for Real server must not be a director.

    • There is no bottleneck problem with the NAT model.

    • There is also no protection of the internal host form in the NAT model.

    • Dip and rip can be either a public address or a private address, but must be in the same network.

    • if it is a public address , the process is about:

    1. CIP sends the request to the VIP.

    2. When the data reaches the previous router of the switch switch, the router sends an ARP broadcast to get the VIP's MAC address, and only the director responds, andReal Server has the address of the VIP, but some configuration does not respond to the request .


    3. After the router receives the MAC address, encapsulate the data frame, and the MAC address is director.

    4. The data frame that encapsulates the target Mac arrives at switch, and the switch finds its own mac-port table (broadcasts a data frame if there is no corresponding entry), sends the data from the corresponding port, and arrives at the director.


    5. Data from the network card entry, the network card driver detects the target MAC is itself, so the data arrives at the network layer after the director's internal route to the input chain, there is a Ipvs on the input port rules exist, so will be here to match the rules, Once the match succeeds, the data is ejected directly from the input port (note: It does not enter the upper layer from the input port) and changes the destination MAC address of the data frame to the MAC address of one of the real server hosts .


    6. The data goes back to switch, and the switch sends the data from the port of the corresponding target Mac.

    7. The data arrives at the real Server, reaches the network layer, discovers the target address is the VIP, is own IP address, passes through the host internal route enters the input port, and the removal IP encapsulation. Real Server After the data processing is completed in the network layer package on the source address is the VIP, the destination address is the CIP IP beginning text . Then, from the output chain, after the host internal routing, the data is sent to the gateway. Eventually passed back to the client.


In the configuration, if there is only one network card, pay attention to the dip address on the network card's primary address, VIP address placed on the alias of the network card. such as: eth0 for dir,eth0:0 for VIP. Because of the ARP broadcast problem, the theme is dip, the broadcast source is the dip. Real server receives a MAC address that will respond normally to rip. But if the theme is VIP, then the broadcast source is the VIP, Real Server received a direct response to their own VIP address.



    • if it is a private address , the data returned in the line need to add an intranet router, and the data transfer process, compared to the public network is the gateway changed it, the original gateway is Route A, is now Route B.


    • In the following diagram, if Route B is not added, the main problem is that route a and rip are not in a network segment, and the data does not leave the real server's network card at all (the data is returned when the network layer is routed) because real There is no corresponding routing entry in the serve to reach Route A.


    • If you add a route to route a in real serve, and then add a route to the real server on route A, you can also use it, but manually adding a route entry is not as neat as adding a router. And it would be simpler to have a front-end router with the right to add an IP address directly.


    • So to add a gateway (Route B) to the intranet, Real server will point the gateway to Route B, so there will be a default route for communication to turn the data out.


    • As mentioned above is not a network segment, because: VIP is the public address, Route a corresponding to the Director's network card should be the same network segment, but also director of the Gateway, of course, is a public address.


    • Although the real server has a VIP address, it is a hidden address that is not allowed to send ARP responses and broadcasts, and there is no routing bar. Its role is to encapsulate the paper file, the source address encapsulated into a VIP.



    • The network underlying communication relies on the MAC address and ARP.

    • The access process is the following rhythm, Route B if it is in the same network segment as route A, the next hop can also point to route a.


650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/6E/BC/wKioL1WEtkSwFxlOAAFvZkLVPBk698.jpg "title=" DR2. JPG "alt=" Wkiol1wetkswfxloaafvzklvpbk698.jpg "/>


In the DR Model, the destination address of the request is always the VIP, and the source address is CIP and does not change. When the reply to the target address is CIP, the source address is VIP.


Tun Model : Not many with Dr model, but director and real server is not in the same network.


650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/6E/BF/wKiom1WEvBrjPTvmAAIF651PORw916.jpg "title=" TUN. JPG "alt=" Wkiom1wevbrjptvmaaif651porw916.jpg "/>

  • The director can no longer rely on changing the target MAC address to bounce the request to the real server. Instead, it relies on IP tunneling technology.

  • IP tunneling (IP tunneling) is the technique of encapsulating an IP message in another IP packet.

  • In the transmission process, the routers in the network only look at the first IP packet and complete the route forwarding, and will not open and see what is inside. Until the data arrives at the target host, the host views the IP message and discovers that it is its own RIP address. So take apart the IP, open later found is configured in the IP tunnel VIP address, and ultimately the VIP address to process the request. The real server then encapsulates the message with the source address as the VIP, the target address is CIP, and replies.


  • VIP, DIP, rip all have to be public network address


  • The VIP of each real server is not visible on the network, is the address that cannot be routed, the VIP address that the dimension one can route to is the director's VIP. Only RIP is available for each real server.

  • Each server's gateway is a gateway on its own network and cannot be a dip.

  • Director according to their own scheduling algorithm to select a real Server, and then encapsulate the double-layer IP packets, the inside is the VIP, and then the outside is the selected rip.

  • Tunneling is a point-to-point link, so the tunneling protocol must be configured at both ends of the link.

  • This model has not been tested and is not well understood. And this model under the dip can not be used.


Do not want to write the writing is so much, the actual configuration will be written in the next article. Write very messy, and are all their own thinking, inevitably there will be the wrong place. Still the old saying, there are mistakes please help to guide. Thank you.

This article is from the "Big Tomato" blog, be sure to keep this source http://fanqie.blog.51cto.com/9382669/1663790

LVS load balancing-nonsense concept

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.