Linux Server Cluster System (2) (1)

Source: Internet
Author: User
Tags http cookie ssl connection

This article mainly introduces the architecture of the LVS cluster. Firstly, the general architecture of the LVS cluster is given, and the design principles and corresponding features are discussed. Finally, the LVS cluster is applied to the establishment of scalable Web, Media, Cache, Mail and other network services.
1. Introduction
Over the past decade, Internet has evolved from a network connected by several research institutions to an information sharing network that has a large number of applications and services. It is becoming an indispensable part of people's lives. Although the speed of Internet development is fast, building and maintaining large network services is still a challenging task, because the system must be high-performance, high-reliability, especially when the access load continues to grow, the system must be scalable to meet increasing performance requirements. The lack of a framework and design methodology for building scalable network services means that only institutions with excellent engineering and management talents can establish and maintain large-scale network services.
In view of this situation, this paper first gives the general architecture of the LVS cluster, and discusses its design principles and corresponding features; finally, the LVS cluster is used to establish scalable Web, Media, Cache, Mail, and other network services.
2. General architecture of the LVS Cluster
The LVS cluster uses IP Server Load balancer technology and content-based request distribution technology. The scheduler has a good throughput rate, which transfers requests evenly to different servers for execution, and the scheduler automatically shields server faults, thus, a group of servers are formed into a high-performance, high-availability virtual server. The entire server cluster structure is transparent to customers, and there is no need to modify the client and server programs.
 
Figure 1: Architecture of the LVS Cluster
To this end, system transparency, scalability, high availability and ease of management must be considered during design. Generally, the LVS cluster adopts a three-tier structure. Its architecture 1 shows that the three-tier structure mainly consists:
Load balancer), which is a front-end server of the entire cluster and is responsible for sending customer requests to a group of servers for execution, the customer thinks that the service comes from an IP address, which we can call as a virtual IP address.
Server pool) is a group of servers that actually execute customer requests. The services executed include WEB, MAIL, FTP, and DNS.
Shared storage), which provides a shared storage area for the server pool. This makes it easy for the server pool to have the same content and provide the same service.
The scheduler is the only Entry Point of the server cluster system. It can use the IP Server Load balancer technology, content-based request distribution technology, or a combination of the two. In the IP Server Load balancer technology, the server pool must have the same content to provide the same service. When a customer requests arrive, the scheduler selects a server from the server pool based on the server load and the preset scheduling algorithm, forwards the request to the selected server, and records the scheduling; when other packets of this request arrive, they are also forwarded to the selected server. In the content-based request distribution technology, the server can provide different services. When a customer requests arrive, the scheduler can select the server to execute the request based on the request content. All operations will be completed in the core space of the Linux operating system, and its scheduling overhead is very small, so it has a high throughput.
The number of nodes in the server pool is variable. When the load received by the system exceeds the processing capacity of all nodes, you can add servers in the server pool to meet the increasing request load. For most network services, there is no strong correlation between requests. requests can be executed concurrently on different nodes, therefore, the performance of the entire system can basically grow linearly as the number of nodes in the server pool increases.
Shared storage is usually a database, Network File System, or distributed file system. Data to be dynamically updated at server nodes is generally stored in the database system, and the database ensures data consistency during concurrent access. Static data can be stored in network file systems such as NFS/CIFS, but the Network File System has limited scalability. Generally, the NFS/CIFS server can only support 3 ~ Six busy server nodes. For large-scale Cluster Systems, you can consider using distributed file systems, such as AFS [1], GFS [2.3], Coda [4], and Intermezzo [5. Distributed File systems can provide shared storage areas for servers. They access distributed file systems just like accessing local file systems, while distributed file systems provide good scalability and availability. In addition, when applications on different servers read and write the same resource on the Distributed File System at the same time, access conflicts between applications must be resolved to make the resources consistent. This requires a Distributed Lock Manager (Distributed Lock Manager), which may be provided inside the Distributed File System or external. When writing applications, developers can use the distributed lock manager to ensure the consistency of concurrent access of applications on different nodes.
Server Load balancer, server pool, and shared storage systems are connected through high-speed networks, such as Mbps switching network, Myrinet, And Gigabit network. The high-speed network is used to avoid the bottleneck of the entire system when the system scale is expanded.
Graphic Monitor is a Monitor for the entire cluster system provided by the system administrator. It can Monitor the status of the system. Graphic Monitor is browser-based, so administrators can Monitor the status of the system either locally or remotely. For security reasons, the browser must pass HTTPSSecure HTTP) protocol and identity authentication before system monitoring and system configuration and management.
2.1. Why use hierarchical architecture
Hierarchical architecture can make the layers independent of each other. Each layer provides different functions and can reuse different existing software at one layer. For example, the scheduler layer provides load balancing, scalability, and high availability. Different network services such as Web, Cache, Mail, and Media can be run on the server layer, to provide different scalable network services. Clear functional division and clear hierarchy make the system easy to build, and the entire system will be easy to maintain in the future, and the system performance will be easily expanded.
2.2. Why is shared storage?
Shared storage, such as distributed file systems, is optional in this LVS cluster system. When the network service requires the same content, shared storage is a good choice. Otherwise, each server needs to copy the same content to the local hard disk. The higher the cost of the Shared-nothing Structure (Shared-nothing Structure) when the system stores more content, because each server requires the same storage space, every server is involved in any update, and the system maintenance cost will be very high.
Shared storage provides a unified storage space for server groups, which makes it easier to maintain system content. For example, Webmaster only needs to update pages in shared storage, which is effective for all servers. The Distributed File System provides good scalability and availability. When the storage space of the Distributed File System increases, the storage space of all servers also increases. For most Internet services, they are Read-intensive applications. The Distributed File System uses a local hard disk as a Cache space such as 2Gbytes on each server ), the local access speed to the distributed file system is close to that to the local hard disk.
In addition, the development of storage hardware technology also promotes migration from a non-shared cluster to a shared storage cluster. Storage Area network (NAS) technology solves the problem that each node in the cluster can directly connect to/share a large hard disk array. hardware vendors also provide a variety of Hard Disk sharing technologies, such as Fiber Channel) shared SCSIShared SCSI ). InfiniBand is a general high-performance I/O specification that enables the Storage Area Network to transmit I/O messages and cluster communication messages at a lower latency, and provides good scalability. InfiniBand is supported by the vast majority of major vendors, such as Compaq, Dell, Hewlett-Packard, IBM, Intel, Microsoft, and SUN Microsystems. It is becoming an industry standard. The development of these technologies makes shared storage easier, and the scale-up production will gradually reduce the cost.
2.3. High Availability
The cluster system features redundancy in hardware and software. The high availability of the system can be achieved by detecting node or service process faults and correctly resetting the system, so that the requests received by the system can be processed by surviving nodes.
Generally, we have a resource monitoring process on the scheduler to monitor the health status of each server node at all times. When the server fails to ping ICMP or detects that her network service does not respond at the specified time, the resource monitoring process notifies the operating system kernel to delete or invalidate the server from the scheduling list. In this way, new service requests will not be scheduled to bad nodes. The resource monitoring process can report a fault to the Administrator by Email or Pager. Once the monitoring process recovers from the server, the scheduler is notified to add it to the scheduling list for scheduling. In addition, through the Management Program provided by the system, the administrator can issue commands to add new machines to the service at any time to improve the processing performance of the system, or cut existing servers out of the service, to maintain the system of the server.
Currently, the front-end scheduler may become a Single Failure Point of the system Single Point of Failure ). In general, the scheduler has a high reliability because the scheduler runs fewer programs and most programs have already been traversed, however, we cannot eliminate major faults such as hardware aging, network lines, or human error operations. In order to avoid the failure of the scheduler and the failure of the entire system, we need to set up a backup from the scheduler as the master scheduler. Two Heartbeat processes [6] run on the master and slave schedulers respectively. They report the health status of each Heartbeat process on a regular basis through the Heartbeat lines such as the serial port and UDP. When the slave Scheduler cannot listen to the heartbeat of the master scheduler, the slave scheduler uses ARP to cheat Gratuitous ARP) to take over the external Virtual IP Address of the cluster, it also takes over the work of the master scheduler to provide the load scheduling service. When the master scheduler recovers, there are two methods: one is that the master scheduler automatically changes to the slave scheduler, and the other is to release the Virtual IP Address from the scheduler, the master scheduler revokes the Virtual IP Address and provides the load scheduling service. Here, the introduction of multiple heartbeat lines can minimize the possibility of misjudgment due to heartbeat line failure. That is, the scheduler considers the master scheduler to be invalid and the master scheduler is still working normally.
Generally, when the master scheduler fails, all the status information of established connections on the master scheduler will be lost, and existing connections will be interrupted. The customer needs to re-connect to schedule the new connection to each server from the dispatcher, which may cause some inconvenience to the customer. Therefore, the IPVS scheduler implements an efficient state synchronization mechanism in the Linux kernel to synchronize the state information of the master scheduler to the slave scheduler in a timely manner. When taking over from the scheduler, most established connections will continue.
3. scalable Web Services
The architecture 2 of LVS-based Web clusters shows that the first layer is the Server Load balancer, which generally uses the IP Server Load balancer technology, so that the entire system has a high throughput; the second layer is the Web server pool. Each node can run HTTP or HTTPS services, or both. The third layer is shared storage, which can be a database, it can be a Network File System, a distributed file system, or a mix of the three. Nodes in the cluster are connected through a high-speed network.
 
Figure 2: LVS-based Web Cluster
For dynamic pages such as PHP, JSP, and ASP, the dynamic data to be accessed is generally stored on the database server. The database service runs on an independent server and is shared with all Web servers. Whether multiple dynamic pages on the same Web server access the same data, or multiple dynamic pages on different Web servers access the same data, the database server locks these accesses in an orderly manner, this ensures data consistency.
Static pages and files, such as HTML documents and images, can be stored in network file systems or distributed file systems. Depending on the system scale and requirements. Through the shared Network File System or distributed file system, the Webmaster can see a Unified File storage space, which facilitates page maintenance and updates. Modifications to pages in shared storage are effective for all servers.
In this structure, when all server nodes are overloaded, the administrator can quickly add new server nodes to process requests without copying Web documents to the node's local hard disk.
Some Web Services may use HTTP cookies, which store data in customers' browsers to track and identify customers. After the HTTP Cookie is used, different connections of the same customer are correlated. These connections must be sent to the same Web server. Some Web services use secure HTTPS protocol, which is the HTTP protocol and SSLSecure Socket Layer) protocol. Some Web Services may use secure HTTPS protocol, which is HTTP protocol and SSL protocol. When the customer accesses the HTTPS service, the default port of HTTPS is 443), an SSL connection is established first to exchange the certificate encrypted by the symmetric public Key and negotiate an SSL Key to encrypt the session. During the lifecycle of an SSL Key, all subsequent HTTPS connections use this SSL Key. Therefore, different HTTPS connections of the same customer are also related. To address these requirements, the IPVS scheduler provides the persistent service function, which allows different connections from the same IP address to be sent to the same server node in the cluster within the set time, it can effectively solve the relevance of customer connections.


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.