Server Load balancer technology to build high-load network sites
Source: Internet
Author: User
Article Title: Server Load balancer technology is used to build a high-load network site. Linux is a technology channel of the IT lab in China. Including desktop applications, Linux system management, kernel research, embedded systems and open-source, and other basic categories. The rapid growth of Internet makes multimedia network servers, especially Web servers, face a rapid increase in the number of visitors, the network server must be able to provide a large number of concurrent access services. For example, Yahoo receives millions of access requests every day. Therefore, CPU and I/O processing capabilities will soon become a bottleneck for servers that provide large-load Web Services.
Simply improving hardware performance does not really solve this problem, because the performance of a single server is always limited. Generally, a PC server can provide about 1000 concurrent access processing capabilities, in addition, high-end dedicated servers can support-concurrent accesses, which still cannot meet the requirements of websites with large loads. In particular, network requests are burstable. When some major events occur, network access will rise sharply, leading to network bottlenecks. For example, Clinton's book of * published on the internet is an obvious example. You must use multiple servers to provide network services and allocate network requests to these servers for sharing in order to provide the capability to process a large number of concurrent services.
When multiple servers are used to share the load, the simplest way is to use different servers in different aspects. When dividing by provided content, one server can be used to provide news pages, while the other server can be used to provide game pages, or the server can be split based on the functions of the server, one server is used to provide static page access, while the other is used to provide dynamic page access that requires a large amount of resources such as CGI. However, due to the sudden nature of network access, it is difficult to determine that the load caused by those pages is too large. if the service page is divided too carefully, it will cause a great waste. In fact, pages that cause excessive loads are often changing. If you need to adjust the server where the page is located frequently according to load changes, it is bound to cause great problems in management and maintenance. Therefore, this split method can only be adjusted in the general direction. For websites with large loads, the fundamental solution also needs to apply the Server Load balancer technology.
Under the idea of Server Load balancer, multiple servers are symmetric. Each server has an equivalent status and can provide external services independently without the assistance of other servers. Then, a server Load balancer technology is used to evenly distribute external requests to a server in the symmetric structure, and the server that receives the requests independently responds to the client requests. Because it is not complicated to create a Web server with identical content, you can use the server to synchronously update or share a bucket, therefore, Server Load balancer technology has become a key technology for building a high-load Web site.
1. Server Load balancer (access redirection) based on specific server software)
Many network protocols support the "redirection" function. For example, if the Location command is supported in HTTP, the browser that receives the command will automatically redirect to another URL specified by Location. Since sending Location commands is much less load on Web servers than executing service requests, you can design a server with Load Balancing Based on this function. When the Web server considers itself to be overloaded, it will not directly send back the webpage requested by the browser, but return a Locaction command, let the browser go to other servers in the server cluster to obtain the required webpage.
In this way, the server itself must support this function, but it is difficult to implement it. For example, how can a server ensure that its redirected server is relatively idle, and will not send the Location command again? The Location command and the browser do not support this feature, so it is easy to form an endless loop on the browser. Therefore, this method is rarely used in practical applications, and there are few server cluster software implemented using this method. In some cases, you can use CGI (including FastCGI or mod_perl extension to improve performance) to simulate this method to share the load, while the Web server still maintains a concise and efficient feature, in this case, the user's CGI program is responsible for tasks that avoid Location loops.
2. DNS-based load balancing (multi-host Single Domain Name load)
Server Load balancer Based on server software needs to change the software, so it is often not worth the candle. Server Load balancer should be done out of the server software to take advantage of the advantages of existing server software. The earliest load balancing technology was implemented through random name resolution in the DNS service. In the DNS server, you can configure the same name for multiple different addresses, the client that finally queries the name will get an address when parsing the name. Therefore, for the same name, different customers get different addresses and access the Web servers at different addresses to achieve load balancing.
For example, if you want to use three Web servers to respond back and forth to HTTP requests from www.exampleorg.org.cn, you can set the domain's DNS server to include the results similar to the following example:
Www1 in a 192.168.1.1
Www2 in a 192.168.1.2
Www3 in a 192.168.1.3
Www in cname www1
Www in cname www2
Www in cname www3
After that, the external client may randomly obtain different www addresses, and then the subsequent HTTP requests will be sent to different addresses.
The advantage of DNS load balancing is that it is simple and easy, and the server can be located anywhere on the Internet. Currently, it is used on websites including Yahoo. However, it also has many disadvantages. One drawback is that to ensure that DNS data is updated in a timely manner, the DNS refresh time is usually set to a small value, if it is too small, it will cause too much extra network traffic and cannot take effect immediately after the DNS data is changed. The second point is that DNS Server Load balancer cannot know the differences between servers, it cannot allocate more requests to servers with better performance, nor understand the current status of the server, or even the occasional situation where customer requests are concentrated on a server.
3. Reverse Proxy Server Load balancer (buffer pool)
The proxy server can forward requests to internal Web servers. Using this acceleration mode can obviously increase the access speed of static Web pages. Therefore, you can also consider using this technology to allow the proxy server to evenly forward requests to one of multiple internal Web servers, so as to achieve load balancing. This proxy method is different from the common proxy method. The standard proxy method is that the customer uses the proxy to access multiple external Web servers. This proxy method is used by multiple customers to access internal Web servers, it is also called reverse proxy mode.
Implementing this reverse proxy capability is not a very complex task, but it requires a very high efficiency in Server Load balancer. This is not very simple to implement. For each proxy, the proxy server must open two connections, one for external connections and the other for internal connections. Therefore, when the number of connection requests is very large, the load on the proxy server is very large. At last, the reverse proxy server becomes a service bottleneck. For example, when the mod_rproxy module of Apache is used to implement load balancing, the number of concurrent connections provided is limited by the number of concurrent connections of Apache. Generally, you can use it to balance the load of sites that consume a large amount of resources for each connection, such as searching.
The advantage of reverse proxy is that it can combine Server Load balancer with the high-speed cache technology of the proxy server to provide beneficial performance and additional security, external customers cannot directly access real servers. In addition, load balancing policies can be implemented to evenly distribute loads to internal servers without the occasional concentration of loads on a server.
4. NAT-Based Load Balancing Technology (intranet cluster and layer-4 Switching)
The network address is converted between the internal address and the external address so that computers with the internal address can access the external network, when a computer in an external network accesses an external address owned by the address translation gateway, the address translation gateway can forward it to a mapped internal address. Therefore, if the address translation gateway can evenly convert each connection to a different internal server address, then the computers in the external network will communicate with the server on the address they have obtained, this achieves load balancing.
Address translation can be implemented through software or hardware. Hardware-based operations are generally called exchanges. When the exchange must save TCP connection information, such operations on the OSI network layer are called layer-4 exchanges. An important feature that allows you to switch the network address of a server Load balancer instance to a layer-4 switch. Because it is based on a customized hardware chip, it has excellent performance, many vswitches claim to have a mb-800mb layer-4 switching capability. However, some data shows that most vswitches no longer have a layer-4 switching capability at such a high speed, only layer-3 or layer-2 exchange is supported.
However, for most sites, the current load balancing mainly solves the bottleneck of the Web server's processing capability, rather than the network transmission capability. The total Internet connection bandwidth of many sites is less than 10 MB, only a few sites can have high-speed network connections, so there is generally no need to use expensive devices such as these load balancers.
The use of software to achieve Load Balancing Based on Network Address Translation requires a lot of practical, in addition to the solutions provided by some vendors, A more effective way is to use free software to complete this task. This includes the NAT Implementation Method in the Linux Virtual Server Project, or the revised version of natd under FreeBSD. Generally speaking, this software method is used to implement address translation. The central Server Load balancer is stored in the bandwidth limit, and the maximum bandwidth can be 80 Mb under the condition of MB of Fast Ethernet, however, in practice, only 40 mb-60 MB of available bandwidth may be available.
5. Extended Load Balancing Technology
The above uses network address translation to achieve load balancing. There is no doubt that all network connections must pass the central Load balancer. If the load is particularly high, as a result, the number of backend servers is no longer several or more than a dozen, but hundreds or even more. Even the use of Excellent hardware switches also encounters bottlenecks. At this point, the problem will be changed to how to distribute multiple servers to multiple locations on each Internet to distribute the network burden. Of course, this can be achieved through the comprehensive use of DNS and NAT methods, but a better way is to use a half-center load balancing method.
In this semi-Central Load Balancing mode, when a customer request is sent to the server Load balancer, the central server Load balancer packages the request and sends it to a server, the server's response request is not returned to the central server Load balancer, but directly to the customer. Therefore, the central server Load balancer is only responsible for receiving and forwarding requests, and the network load is low.
This hardware implementation method is expensive, but it has different special features, such as SSL support, depending on the vendor.
This method is complex, so it is difficult to implement and has a very high starting point. In the current situation, websites do not need such a large processing capability.
Compared with the above load balancing method, DNS is the easiest and most commonly used to meet general requirements. However, if you need further management and control, you can select the reverse proxy or NAT mode. It is important to select the primary dependency buffer between the two methods, the maximum number of concurrent accesses is equal. If the CGI program that has a severe load impact on the website is developed by the website itself, you can also consider using Locaction in the program to support load balancing. The semi-centralized load balancing method is not required at least in the current situation in China.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.