Linux server cluster system

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

LVS cluster architecture

This article mainly introduces the architecture of the LVS cluster. Firstly, the general architecture of the LVS cluster is given, and the design principles and corresponding features are discussed. Finally, the LVS cluster is applied to the establishment of scalable Web, Media, Cache, Mail and other network services.

1. Introduction
Over the past decade, Internet has evolved from a network connected by several research institutions to an information sharing network that has a large number of applications and services. It is becoming an indispensable part of people's lives. Although the speed of Internet development is fast, building and maintaining large network services is still a challenging task, because the system must be high-performance, high-reliability, especially when the access load continues to grow, the system must be scalable to meet increasing performance requirements. The lack of a framework and design methodology for building scalable network services means that only institutions with excellent engineering and management talents can establish and maintain large-scale network services.

In view of this situation, this paper first gives the general architecture of the LVS cluster, and discusses its design principles and corresponding features; finally, the LVS cluster is used to establish scalable Web, Media, Cache, Mail, and other network services.

2. General architecture of the LVS Cluster
The LVS cluster uses IP Server Load balancer technology and content-based request distribution technology. The scheduler has a good throughput rate, which transfers requests evenly to different servers for execution, and the scheduler automatically shields server faults, thus, a group of servers are formed into a high-performance, high-availability virtual server. The entire server cluster structure is transparent to customers, and there is no need to modify the client and server programs.

To this end, system transparency, scalability, high availability and ease of management must be considered during design. Generally, the LVS cluster adopts a three-tier structure. Its architecture 1 shows that the three-tier structure mainly consists:

* The load balancer (load balancer) is a front-end server of the entire cluster. It is responsible for sending client requests to a group of servers for execution, the customer thinks that the service comes from an IP address (we can call it a virtual IP address.
* Server pool is a group of servers that actually execute customer requests. The services executed include WEB, MAIL, FTP, and DNS.
* Shared storage provides a shared storage area for the server pool. This makes it easy for the server pool to have the same content and provide the same service.

The scheduler is the only entry point (single entry point) of the server cluster system. It can use the IP Server Load balancer technology, content-based request distribution technology, or a combination of the two. In the IP Server Load balancer technology, the server pool must have the same content to provide the same service. When a customer requests arrive, the scheduler selects a server from the server pool based on the server load and the preset scheduling algorithm, forwards the request to the selected server, and records the scheduling; when other packets of this request arrive, they are also forwarded to the selected server. In the content-based request distribution technology, the server can provide different services. When a customer requests arrive, the scheduler can select the server to execute the request based on the request content. All operations will be completed in the core space of the Linux operating system, and its scheduling overhead is very small, so it has a high throughput.

The number of nodes in the server pool is variable. When the load received by the system exceeds the processing capacity of all nodes, you can add servers in the server pool to meet the increasing request load. For most network services, there is no strong correlation between requests. requests can be executed concurrently on different nodes, therefore, the performance of the entire system can basically grow linearly as the number of nodes in the server pool increases.

Shared storage is usually a database, Network File System, or distributed file system. Data to be dynamically updated at server nodes is generally stored in the database system, and the database ensures data consistency during concurrent access. Static data can be stored in network file systems (such as NFS/CIFS), but the Network File System has limited scalability. Generally, the NFS/CIFS server can only support 3 ~ Six busy server nodes. For large-scale Cluster Systems, you can consider using distributed file systems, such as AFS [1], gfs [2.3], Coda [4], and intermezzo [5. Distributed File systems can provide shared storage areas for servers. They access distributed file systems just like accessing local file systems, while distributed file systems provide good scalability and availability. In addition, when applications on different servers read and write the same resource on the Distributed File System at the same time, access conflicts between applications must be resolved to make the resources consistent. This requires a distributed lock manager, which may be provided inside the Distributed File System or external. When writing applications, developers can use the distributed lock manager to ensure the consistency of concurrent access of applications on different nodes.

Load schedulers, server pools, and shared storage systems are connected through high-speed networks, such as 100 Mbps switching networks, Myrinet, and Gigabit Networks. The high-speed network is used to avoid the bottleneck of the entire system when the system scale is expanded.

Graphic monitor is a monitor for the entire cluster system provided by the system administrator. It can monitor the status of the system. Graphic monitor is browser-based, so administrators can monitor the status of the system either locally or remotely. For security reasons, the browser must pass the HTTPS (secure HTTP) protocol and identity authentication before system monitoring and system configuration and management.

2.1. Why use hierarchical architecture

Hierarchical architecture can make the layers independent of each other. Each layer provides different functions and can reuse different existing software at one layer. For example, the scheduler layer provides load balancing, scalability, and high availability. Different network services such as web, cache, mail, and media can be run on the server layer, to provide different scalable network services. Clear functional division and clear hierarchy make the system easy to build, and the entire system will be easy to maintain in the future, and the system performance will be easily expanded.

2.2. Why is shared storage?

Shared storage, such as distributed file systems, is optional in this LVS cluster system. When the network service requires the same content, shared storage is a good choice. Otherwise, each server needs to copy the same content to the local hard disk. The more content the system stores, the higher the cost of this shared-nothing structure, because each server requires the same storage space, every server is involved in any update, and the system maintenance cost will be very high.

Shared storage provides a unified storage space for server groups, which makes it easier to maintain system content. For example, webmaster only needs to update pages in shared storage, which is effective for all servers. The Distributed File System provides good scalability and availability. When the storage space of the Distributed File System increases, the storage space of all servers also increases. For most Internet services, they are read-intensive (read-intensive) applications. Distributed File systems use local hard disks as cache (such as 2 gbytes space) on each server ), the local access speed to the distributed file system is close to that to the local hard disk.

In addition, the development of storage hardware technology also promotes migration from a non-shared cluster to a shared storage cluster. The Storage Area Network technology solves the problem that each node in the cluster can directly connect to/share a large hard disk array. hardware vendors also provide a variety of Hard Disk sharing technologies, for example, Fiber Channel and Shared SCSI ). InfiniBand is a general high-performance I/O specification that enables the Storage Area Network to transmit I/O messages and cluster communication messages at a lower latency, and provides good scalability. InfiniBand is supported by the vast majority of major vendors, such as Compaq, Dell, Hewlett-Packard, IBM, Intel, Microsoft, and SUN Microsystems. It is becoming an industry standard. The development of these technologies makes shared storage easier, and the scale-up production will gradually reduce the cost.

2.3. High Availability

The cluster system features redundancy in hardware and software. The high availability of the system can be achieved by detecting node or service process faults and correctly resetting the system, so that the requests received by the system can be processed by surviving nodes.

Generally, we have a resource monitoring process on the scheduler to monitor the health status of each server node at all times. When the server fails to ping ICMP or detects that her network service does not respond at the specified time, the resource monitoring process notifies the operating system kernel to delete or invalidate the server from the scheduling list. In this way, new service requests will not be scheduled to bad nodes. The resource monitoring process can report a fault to the Administrator by Email or Pager. Once the monitoring process recovers from the server, the scheduler is notified to add it to the scheduling list for scheduling. In addition, through the Management Program provided by the system, the administrator can issue commands to add new machines to the service at any time to improve the processing performance of the system, or cut existing servers out of the service, to maintain the system of the server.

Currently, the front-end scheduler may become a Single Point of Failure ). In general, the scheduler has a high reliability because the scheduler runs fewer programs and most programs have already been traversed, however, we cannot eliminate major faults such as hardware aging, network lines, or human error operations. In order to avoid the failure of the scheduler and the failure of the entire system, we need to set up a backup from the scheduler as the master scheduler. Two Heartbeat processes [6] run on the master and slave schedulers respectively. They report the health status of each Heartbeat process on a regular basis through the Heartbeat line and UDP. When the slave Scheduler cannot listen to the heartbeat of the master scheduler, the slave scheduler uses Gratuitous ARP to take over the external Virtual IP Address of the cluster, it also takes over the work of the master scheduler to provide the load scheduling service. When the master scheduler recovers, there are two methods: one is that the master scheduler automatically changes to the slave scheduler, and the other is to release the Virtual IP Address from the scheduler, the master scheduler revokes the Virtual IP Address and provides the load scheduling service. Here, multiple heartbeat lines can minimize the possibility of misjudgment due to heartbeat line faults (that is, the scheduler considers that the master scheduler has failed, but the master scheduler is still working normally.

Generally, when the master scheduler fails, all the status information of established connections on the master scheduler will be lost, and existing connections will be interrupted. The customer needs to re-connect to schedule the new connection to each server from the dispatcher, which may cause some inconvenience to the customer. Therefore, the IPVS scheduler implements an efficient state synchronization mechanism in the Linux kernel to synchronize the state information of the master scheduler to the slave scheduler in a timely manner. When taking over from the scheduler, most established connections will continue.

3. scalable Web Services

The architecture 2 of LVS-based Web clusters shows that the first layer is the Server Load balancer, which generally uses the IP Server Load balancer technology, so that the entire system has a high throughput; the second layer is the Web server pool. Each node can run HTTP or HTTPS services, or both. The third layer is shared storage, which can be a database, it can be a Network File System, a distributed file system, or a mix of the three. Nodes in the cluster are connected through a high-speed network.

For dynamic pages (such as PHP, JSP, and ASP), the dynamic data to be accessed is generally stored on the database server. The database service runs on an independent server and is shared with all web servers. Whether multiple dynamic pages on the same web server access the same data, or multiple dynamic pages on different Web servers access the same data, the database server locks these accesses in an orderly manner, this ensures data consistency.

Static pages and files (such as HTML documents and images) can be stored in network file systems or distributed file systems. Depending on the system scale and requirements. Through the shared Network File System or distributed file system, the Webmaster can see a Unified File storage space, which facilitates page maintenance and updates. Modifications to pages in shared storage are effective for all servers.

In this structure, when all server nodes are overloaded, the administrator can quickly add new server nodes to process requests without copying Web documents to the node's local hard disk.

Some Web Services may use HTTP cookies, which store data in customers' browsers to track and identify customers. After the HTTP cookie is used, different connections of the same customer are correlated. These connections must be sent to the same web server. Some web services use secure HTTPS protocol, which is the HTTP protocol plus SSL (Secure Socket Layer) protocol. Some Web Services may use secure HTTPS protocol, which is HTTP protocol and SSL protocol. When the customer accesses the HTTPS service (the default port of HTTPS is 443), an SSL connection is established first to exchange the certificate encrypted by the symmetric public key and negotiate an SSL Key, to encrypt the session. During the lifecycle of an SSL key, all subsequent HTTPS connections use this SSL Key. Therefore, different HTTPS connections of the same customer are also related. To address these requirements, the ipvs scheduler provides the persistent service function, which allows different connections from the same IP address to be sent to the same server node in the cluster within the set time, it can effectively solve the relevance of customer connections.

4. scalable media services

The LVS-based media cluster architecture 3 shows that the first layer is the Server Load balancer, which generally uses the IP Server Load balancer technology, so that the entire system has a high throughput; the second layer is the Web server pool, where corresponding media services can be run on each node; the third layer is shared storage, which stores media programs through the Network File System/Distributed File System. Each node in the cluster is connected through a high-speed network.

The IPVS Server Load balancer generally uses direct routing (namely, the VS/DR method, which will be described in detail in later articles) to construct a media cluster system. The scheduler distributes Media Service requests evenly to each server, and the Media Server directly returns the response data to the customer, which makes the entire media cluster system highly scalable.

The Media Server can run various media service software. At present, the LVS cluster has good support for Real Media, Windows Media, and Apple Quicktime Media Services, and all have Real systems running. In general, Streaming media services use a TCP connection (such as RTSP Protocol: Real-Time Streaming Protocol) for bandwidth negotiation and flow rate control. Streaming data is returned to the customer through UDP. Here, the IPVS scheduler provides the function to focus on TCP and UDP, so that the TCP and UDP connections from the same customer are forwarded to the same media server in the cluster, this ensures that the media service is correctly performed.

Shared storage is the most critical issue in the media cluster system, because the media files are often very large (a film requires several hundred megabytes to several gigabytes of storage space ), this requires high storage capacity and read speed. For small-sized media cluster systems, such as 3 to 6 media server nodes, the storage system may consider using Linux servers with gigabit NICs, using software RAID and log file systems, running the NFS service of the kernel will have a good effect. For large media cluster systems, it is best to choose a distributed File system that supports File Segment Storage and File cache; media files are stored in multiple storage nodes of the Distributed File System in segments to improve the performance of the file system and load balancing between storage nodes. Media files are automatically cached on the Media Server, this increases the speed of file access. Otherwise, you can develop appropriate tools on the Media Server. For example, the cache tool can regularly collect recent hotspot media files and copy the hotspot files to the local hard disk, replace the non-hotspot files in the cache, and notify other media server nodes of the cached media files and load. The Media Server has an application-layer scheduling tool, it receives the customer's media service request. If the requested media file is cached on the local hard disk, it is directly transferred to the local media service process service, otherwise, consider whether the file is cached by another media server. If the file is cached by another server and the server is not busy, transfer the request to the Media Service Process on the server for processing, otherwise, it is directly transferred to the local media service process to read media files from the backend shared storage.

The advantage of shared storage is that media file administrators can view a unified storage space, making it easier to maintain media files. When the customer's access continues to overload the entire system, the administrator can quickly add a new media server node to process requests.

Real is famous for its high compression ratio of audio and video formats, Real Media Servers and Media Player RealPlayer. Real is using the above structure to build an LVS scalable Web and media cluster consisting of more than 20 servers to provide Web and audio and video services for its users around the world. Real's Senior Technical Director claims LVS beat all the commercial Server Load balancer products they have tried [7].

5. Scalable Cache service

An effective network cache system can greatly reduce network traffic, response latency, and server load. However, if the cache server is overloaded and cannot process requests in a timely manner, it will increase the response latency. Therefore, the scalability of the cache service is very important. When the system load increases, the entire system can be extended to improve the processing capability of the cache service. In particular, the cache service on the backbone network may require several Gbps Throughput. A single server (such as Sun's highest-end enterprise 10000 server) cannot achieve this throughput. It can be seen that using a PC server cluster to implement the Scalable Cache service is a very effective method and the method with the highest cost-performance ratio.

The LVS-based Cache cluster architecture 4 is shown in Figure 4: the first layer is the Server Load balancer, which generally uses the IP Server Load balancer technology, so that the entire system has a high throughput; the second layer is the Cache Server pool. Generally, cache servers are placed close to the backbone Internet connection and can be distributed across different networks. The scheduler can have multiple schedulers, which are placed close to the customer.

Ipvs Server Load balancer generally uses the IP tunneling method (that is, the VS/TUN method, which will be described in detail in later articles) to construct the cache cluster system, because the cache server may be placed in different places (for example, near the backbone Internet connection), and the scheduler and the Cache Server pool may not be in the same physical network. The Vs/TUN method is used. The Scheduler only schedules Web cache requests, and the cache server directly returns the response data to the customer. When the request object cannot be locally hit, the cache server sends a request to the source server, retrieves the result, and finally returns the result to the customer. If the NAT technology is used as a commercialized scheduler, the scheduler needs to access the scheduler four times to complete the request. With the VS/TUN method (or VS/DR method), The scheduler schedules requests only once, and the other three requests are directly accessed by the cache server over the Internet. Therefore, this method is particularly effective for the cache cluster system.

The Cache server uses a local hard disk to store cacheable objects, because the stored cacheable objects are write operations and occupy a certain proportion. The access speed of I/O can be improved through the local hard disk. The Cache server has a dedicated Multicast Channel (Multicast Channel) and uses the ICP Protocol (Internet Cache Protocol) to exchange information. When a Cache server does not hit the current request in the local hard disk, it can query whether other Cache servers have copies of the requested object through the ICP. If yes, then, a copy of the object is retrieved from the adjacent Cache server, which further improves the Cache service hit rate.

In 150, the British National Jenkins Web Cache network, which serves more than November 1999 universities and regions, used the above LVS structure to implement a Scalable Cache Cluster [8]. only half of the original 50 independent Cache servers are used. Users reflect that the network speed is as fast as that in summer (for students in summer vacation ). It can be seen that through load scheduling, you can touch the Burst accessed by a single server to improve the resource utilization of the entire system.

6. scalable email service

As Internet users continue to grow, Many ISPs are facing the problem of overloading their email servers. When the mail server cannot accommodate more user accounts, some ISPs purchase more advanced servers to replace the original server information (such as user mail) migrating to a new server is cumbersome and can interrupt services. Some ISPs set up new servers and new mail domains, and new mail users are placed on new servers, for example, Shanghai Telecom now uses different email servers public1.sta.net.cn and public2.sta.net.cn to public9.sta.net.cn to place users' email accounts. In this way, users are statically divided into different servers, resulting in unbalanced load on the mail server, the system's resource utilization is low, which is hard to remember for users.

The LVS framework can be used to implement a highly scalable and highly available email service system. Its architecture 5 shows that the front end is a load scheduler using IP load balancing technology; the second layer is the server pool with LDAP (Light-weight Directory Access Protocol) server and a group of email servers. The third layer is data storage, which stores users' emails through the distributed file system. Each node in the cluster is connected through a high-speed network.

User information, such as user name, password, home directory, and mail capacity limit, is stored on the LDAP server. The administrator can perform user management through HTTPS. Run the SMTP (Simple Mail Transfer Protocol), POP3 (Post Office Protocol version 3), IMAP4 (Internet Message Access Protocol version 4), and HTTP/HTTPS services on each Mail server. SMTP accepts and forwards users' emails. the SMTP service process queries the LDAP server to obtain user information and then stores emails. POP3 and IMAP4 obtain user information through the LDAP server. After password verification, they process users' email access requests. Here, a mechanism is required to avoid read/write conflicts between SMTP, POP3, and IMAP4 service processes on different servers. HTTP/HTTPS allows users to access emails through a browser.

The IPVS scheduler distributes traffic loads of SMTP, POP3, IMAP4, and HTTP/HTTPS requests to the mail servers in a balanced manner. From the process of the preceding services, no matter which email server the request is sent for processing, the results are the same. Here, you can run SMTP, POP3, IMAP4, and HTTP/HTTPS on each mail server for centralized scheduling, which is conducive to improving the resource utilization of the entire system.

The possible bottleneck in the system is the LDAP server, which optimizes the parameters of the B + tree in the LDAP service and combines them with high-end servers to achieve high performance. If the Distributed File System does not have a load balancing mechanism between multiple storage nodes, the corresponding mail migration mechanism is required to avoid mail access skew.

In this way, the cluster system is like a high-performance, high-reliability Mail Server (for example, Shanghai Telecom only needs to use a mail domain name public.sta.net.cn ). As Email users grow, you only need to add server nodes and storage nodes to the cluster. Centralized storage of user information makes user management easy, and the cluster system helps improve resource utilization.

7. Summary

This article provides the general architecture of the LVS cluster, discusses its design principles and corresponding features, and finally applies the LVS cluster to establish scalable Web, Media, Cache, and Mail network services, points that should be paid attention to during system setup are also pointed out. We will explain in detail the technology, implementation and application of the LVS cluster in subsequent articles.

About the author
Chapter text author (wensong@linux-vs.org), open source and Linux kernel developers, the famous Linux cluster project-LVS (Linux Virtual Server) founder and main developers. He is currently working in the National Key Laboratory of parallel and distributed processing and is mainly engaged in the research of cluster technology, operating systems, Object Storage and database. He has been spending a lot of time developing Free Software and is happy with it.

This article is published by IBM developerWorks

Reprint of authorized space

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Linux server cluster system

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Linux server cluster system

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support