Article Title: Application of Linux cluster technology in Web Servers. Linux is a technology channel of the IT lab in China. Includes basic categories such as desktop applications, Linux system management, kernel research, embedded systems, and open source.
Introduction
The popularity of the network has led to a sharp increase in the traffic of various e-commerce and e-government websites. The system bottleneck problem has become increasingly serious, which directly affects the response time of user requests, and some requests may even be rejected during peak periods. In the face of this situation, there are generally two solutions: one is to upgrade hardware and purchase servers with better performance, but the cost is large and the effect is not necessarily ideal, obviously, it is not the best solution. Second, we use the cluster technology to add several new servers and build a Server Load balancer Cluster System Based on the original hardware investment. The cluster computing mode is a relatively economical computing mode that allows users to build a cluster of common commercial hardware systems and add new hardware as needed. It not only improves the system performance, but also reduces the cost. This article briefly discusses the design idea of a cluster system for Enterprise Web Services, and provides the design and implementation process of key modules.
1. Cluster Technology
The cluster technology is a connection between computer systems. It can be used to connect scattered computing systems to complete tasks that could not be completed by the Computing System of the original single node. The earliest cluster system emerged for such parallel processing purposes. However, with the rapid development of computer performance and the increase of network insecurity, system stability and reliability are the main problems to be solved. Therefore, two or more devices are connected to the cluster. When a single point or partial multi-point fault occurs in the cluster system, other computers in the cluster will automatically take over the faulty device. The most typical example is dual-host hot backup. Two identical computer systems are connected using cluster software, one of which serves as another backup. When the host system crashes, another task to replace it. In addition, using the high parallel performance of the cluster system for complex scientific and engineering computing is also very valuable.
The cluster technology can effectively solve the stability, compression, and load balancing problems of open systems. The main purpose of establishing a cluster system is:
(1) existing applications are guaranteed.
(2) improve cost effectiveness.
(3) share resources.
(4) improve flexibility and scalability.
(5) Enhance practicability and fault tolerance.
(6) Flexible scalability.
2. design objectives and system architecture
Design a Linux-based cluster system that constantly monitors the load status of each server in the cluster and forwards Web requests from the Internet to an actual server on the Intranet for execution. Specifically, it has the following features:
(1) Service forwarding. Accept a variety of TCP/IP-based service requests from the external network and forward them to the machine with the least load for execution.
(2) dynamic load balancing. The balancer can monitor the Load Status of the actual servers on the Intranet and find the machine with the lightest load.
(3) connection continuity. All requests from the same customer on the Internet must be forwarded to the same server on the Intranet for processing.
The cluster system consists of scheduling machine, Server 1, Server 2, server N, and other machines. The dispatcher acts as an interface between the Intranet and the Internet. Can receive user requests from the Internet. the user request is sent to a machine in Server 1 to server N. After the server completes processing the request, the processed result is sent to the dispatcher, then, the dispatcher returns the result to the user of the external network. In the test environment, the IP address of the customer machine on the internet is 211.80.248.100, And the scheduling machine has two IP addresses, one of which is 211.80.248.99 on the Internet. One is 192.168.1.1 of the Intranet. The operating system kernel running on the dispatcher is Linux 2.2.x. There are N servers in the Intranet, and the IP addresses are 192.168.1.2 and 192.168.1.3... 192.168.1.N. The operating system running above is also a Linux operating system. set 192.168.1.1 as the gateway. and added routes to the 211.80.248.0 network. The purpose of Server Load balancer is to distribute client requests to the scheduling machine according to the current load of the Intranet machine to each server.
The system implements IP-level load balancing. When an Internet client sends a request to the dispatcher, the target address of the request is replaced at the IP layer of the dispatcher. The target address is replaced with the IP address of the server with the lightest load on the Intranet. Then forward the request again. After the server in the Intranet processes the request, the processing result is sent to the dispatcher. At the IP layer, the dispatcher replaces the target address with the IP address of the customer in the Internet that sends the request. Then forward the response to the customer.
The system consists of the IP camouflage module, IP port forwarding module, and scheduling module. The implementation of the IP port forwarding module is relatively simple. You only need to add appropriate forwarding rules to a Linux Kernel File and call the function of the IP camouflage module. Therefore, this module is not described in detail.
3. IP camouflage
The private addresses in the IP addresses cannot communicate directly with machines on the Internet. If they want to communicate with machines on the Internet, they must adopt the Network Address Translation (NAT) mechanism. In fact, the IP camouflage mechanism is a kind of dynamic network address translation of M-1, which can map multiple IP addresses in the Intranet to an Internet IP address connected to the Internet, in this way, Intranet machines that cannot communicate directly with machines on the Internet can communicate with the outside world through this ing Machine. Network Address port translation is an extension of network address translation. It translates many network addresses and TCP/UDP ports into an IP address and a TCP/UDP port. The cluster system uses the network address and port translation mechanism. To enable Linux to support IP camouflage, you must re-compile the Linux kernel and install and activate the IP camouflage module.
The main work of the IP camouflage module includes:
(1) receive all requests sent from the Intranet to the Internet.
(2) connection requests in the Intranet are forwarded to the Internet through the dispatcher.
(3) Hide the source address of all requests sent from the Intranet to the Internet so that all requests appear to be sent by the dispatcher.
(4) create a HASH table to record all established connections.
(5) receive the response from the Internet to the request and forward it to the machine that sends the request in the intranet.
The process of sending a connection request from the Intranet to the Internet is as follows (the opposite process is very similar ):
4. scheduling module
The scheduling module is implemented at the application layer, because the scheduling policy can be flexibly controlled at the application layer, static scheduling policies can be used, or dynamic scheduling policies can be used. At the same time, the scalability of the system can be improved, when you need to add a new server when the website traffic increases, you only need to add a data entry to the application. The main tasks of the scheduling module include:
(1) The dispatcher sends a command to each actual server to collect load information.
(2) Each actual server runs a program that obtains the length of the CPU running queue.
(3) Each machine returns the length of its CPU running queue to the dispatcher.
(4) The dispatcher compares the CPU running queue lengths of each machine and selects the machine with the shortest CPU running queue length. The dispatcher considers this machine as the least negative machine.
(5) Pass the IP address of the machine with the least load to the IP port forwarding module through a system call.
The process of the Server is shown in step 3.
5. Summary
The cluster system achieves load balancing at the IP level. The core of the target address Rewriting for IP packets is mainly completed. This is because the speed is very fast, saving the communication process from the user to the core. The load balancing system has the following functions:
(1) Ability to forward various services based on TCP/IP protocols, such as telnet, ftp, and http.
(2) find the server with the lightest load in the Intranet to respond to user requests and achieve dynamic load balancing.
(3) By matching Hash table items, multiple requests from the same service can be sent to the actual server on the same Intranet to ensure continuity.
(4) millisecond-level response time can be guaranteed.
(5) users can control the number of servers in the Intranet in their applications, and have good scalability.
(6) When a server fails, it can be found in a timely manner and has good fault tolerance.
Of course, there are still many improvements to this system. For example, the scheduling machine should be backed up by two machines. In the implementation process of large-scale application-level cluster service systems, designers need to consider more elements: service resource design, resource monitoring, Server Load balancer, failover management, reliability, availability, and performance.