Linux Server cluster System (i)

Source: Internet
Author: User
Tags knowledge base



Add by Zhj: Although it was a 2002 article, it was a lot of money to read. In Zhangwensong: About LVs and Ali Open source behind the wonderful story of LVs initiator and main contributors talk about the development process of LVS and Ali Open source of some stories



Original: http://www.linuxvirtualserver.org/zh/lvs1.html



This paper introduces the background and target of the Linux server cluster system--LVS (Linux Virtual Server) project, describes the LVS server cluster framework and the software currently provided, enumerates the features of the LVS cluster system and some practical



Application, finally, this paper discusses the development and development of the LVS project.



1. Background



Today's computer technology has entered a network-centric computing period. Due to the simplicity, manageability and maintainability of the client/server model, the client/server computing model is widely used on the Internet. In the middle of the 90 's, the advent of the World Wide Web (Wide), with its simple operation, brought pictures and illustrations of online information to the general public, and the WEB is becoming a service platform from a content delivery mechanism, a large number of services and applications (such as news services, internet banking, E-commerce, etc.) are all around the web. This has contributed to the dramatic growth of Internet users and the explosion of Internet traffic, and Figure 1 shows how the number of Internet-connected hosts changed in 1995-2000 [1], and the growth trend is more rapid than ever.



Figure 1: Changes in the number of Internet hosts in 1995-2000


The rapid development of Internet brings huge challenge to network bandwidth and server. From the network technology development, the network bandwidth growth is much higher than the processor speed and memory access speed growth, such as 100M Ethernet, ATM, Gigabit Ethernet, etc. constantly emerging, 10Gigabit Ethernet will be ready, Dense wavelength division multiplexing (DWDM) on the backbone network will become the mainstream technology of broadband IP [2,3],lucent has been launched in a fiber-run 800Gigabit WaveStar? OLS 800G Products [4]. Therefore, we are convinced that more and more bottlenecks will appear on the server side. Many studies show that the gigabit Ethernet on the server is difficult to make its throughput rate of 1gb/s is the result of protocol stack (TCP/IP) and operating system inefficiencies, as well as the inefficient processor, which requires the protocol processing methods, operating system scheduling and IO processing for more in-depth study. On high-speed networks, it is also an important task to redesign the Network Service program on a single server.



The more popular sites will attract unprecedented access traffic, such as Yahoo has sent 625 million pages per day [5] According to Yahoo News releases. Some network services also receive huge traffic, such as American Online's web cache system processes 5.02 billion users accessing the web every day, with an average response length of 5.5Kbytes per request. At the same time, many network services because of the explosion of access to the overwhelming burden, not timely processing of user requests, resulting in a long wait for users, greatly reducing the quality of service. How to build scalable network services to meet the increasing demand for load has become an imminent problem.



Most sites need to provide 24 hours a day, 7 days a week, especially for e-commerce sites, and any disruption of service and critical data loss can result in direct business losses. For example, according to Dell's press release [6],dell now earns $14 million per day on its website, and a one-hour service outage would result in an average loss of $580,000. Therefore, this is a more and more high demand for the reliability of Network service.



Today, more and more CPU-intensive applications such as CGI and dynamic home pages are being used in Web services, which have a high demand for server performance. Future Web services will provide richer content, better interactivity, higher security, and more CPU and I/O processing power for the server. For example, taking a static page over HTTPS (Secure HTTP) requires more processing performance than an order of magnitude over HTTP, and HTTPS is being widely used by e-commerce sites. Therefore, the network traffic does not explain all the problems, it is necessary to consider the development of the application itself also need to be more and more processing performance.



As a result, the demand for high-scalability, high-availability network services with hardware and software methods is growing, and this demand can be attributed to the following:


    • Scalability (Scalability), when the load of services increases, the system can be scaled to meet demand without compromising service quality.
    • High availability (availability), although some hardware and software fails, the entire system service must be available 24 hours a day, 7 days a week.
    • Manageability (manageability), the entire system may be physically large, but it should be easy to manage.
    • Price effectiveness (cost-effectiveness), the entire system implementation is economical, easy to pay.


2. Server cluster system



Symmetric multi-processing (symmetric multi-processor, referred to as SMP) is a computer system composed of multiple symmetric processors, and memory and I/O parts that are shared via the bus. SMP is a low degree of parallelism structure, is what we commonly call "tightly coupled multi-processing system", its scalability is limited, but the advantage of SMP is a single system image ("image"), with shared memory and I/O, easy to program.



Due to the limited scalability capabilities of SMP, it is clear that SMP servers cannot meet the growing demand for load handling capabilities in highly scalable, highly available network services. As the load continues to grow, it can cause the server to escalate continuously. This server upgrade has the following deficiencies: First, the upgrade process is cumbersome, the machine switchover will temporarily interrupt the service, and cause the waste of the original computing resources; the higher the high-end server, the higher the cost; third, the SMP server is a single point of failure (Failure), Failure of the server or application will result in disruption of the entire service.



Server clusters that are interconnected through high-performance networks or LANs are becoming effective structures for highly scalable, high-availability network services. The loosely coupled architecture of the server cluster system has the following advantages:


    • Performance
      The workload of network services is often a large number of independent tasks, with a set of servers to divide and conquer, you can achieve a high overall performance.

    • Performance/Price ratio
      The PC server or RISC server and standard network equipment that make up the cluster system have the highest performance/price ratio because of the lower cost and lower price in mass production. If the overall performance increases linearly with the number of nodes, the performance/price ratio of the system is close to that of the PC server. Therefore, this loosely coupled structure has a better performance/price ratio than a tightly coupled multiprocessor system.

    • Scalability
      The number of nodes in a clustered system can grow to thousands of, or even tens of thousands, and its scalability is far greater than that of a single supercomputer.

    • High Availability
      Redundancy in both hardware and software enables high availability by detecting the failure of software and hardware, blocking faults, and providing services from surviving nodes.


Of course, there are many challenges to implementing scalable network services with server cluster systems:


    • Transparency (Transparency)
      How to effectively make a loosely-coupled cluster system composed of multiple independent computers into a virtual server; When client applications interact with a clustered system, the client does not need to make any modifications as it interacts with a high-performance, highly available server. The cut-in and cut-out of some servers will not disrupt the service, which is also transparent to the user.

    • Performance (performance)
      Performance to approach linear acceleration, which requires the design of a good hardware and software architecture, to eliminate the potential bottlenecks of the system. Dispatch the load to each server in a more balanced manner.

    • High Availability (availability)
      Need to design and implement a good system resources and fault monitoring and processing system. When a module fails to be found, the services provided on the module are migrated to other modules. In an ideal situation, this migration is instantaneous and automatic.

    • Manageability (manageability)
      To make a clustered system manageable, it is like managing a single image system. Ideally, Plug and Play (Plug & Play) can be plugged in and out of the hardware and software modules.

    • Programmability (programmability)
      On a clustered system, it is easy to develop applications.


3. Linux Virtual Server Project



For the requirements of highly scalable, highly available network services, we give a load balancing scheduling solution based on IP layer and content request distribution, and implement these methods in the Linux kernel to make a set of servers into a virtual server that implements scalable, highly available network services.



Virtual Server Architecture 2 shows that a group of servers are interconnected via a high-speed LAN or geographically distributed WAN, with a load scheduler (load Balancer) on their front-end. The load scheduler can seamlessly dispatch network requests to real servers, which makes the structure of the server cluster transparent to the customer, and the network services provided by the client access to the cluster system are like accessing a high-performance, highly available server. The client program is not affected by the server cluster and requires no modification. The scalability of the system is achieved by transparently joining and deleting a node in the service cluster to achieve high availability by detecting node or service process failures and correctly resetting the system. Since our load scheduling technology is implemented in the Linux kernel, we call it the Linux virtual server (Linux Vsan).



Figure 2: Structure of the virtual server


In May 1998, I set up a free software project for Linux Virtual Server to develop the Linux server cluster. At the same time, the Linux Virtual Server project is one of the earliest free software projects in China.



The goal of the Linux Virtual Server project is to implement a high-performance, highly available server using cluster technology and Linux operating system, which has good scalability (Scalability), Reliability (reliability) and manageability (manageability).



Currently, the LVS project has provided a Linux Virtual server framework that implements a scalable network service, as shown in 3. In the LVS framework, IP virtual Server software Ipvs with three IP load balancing technologies, Kernel Layer-7 switch Ktcpvs and cluster management software based on content request distribution are provided. The LVS framework enables highly scalable, highly available network services such as Web, Cache, mail, and media, enabling the development of highly scalable, highly available e-commerce applications that support a large number of users.



Figure 3:linux Virtual Server framework


3.1 IP Virtual Server Software Ipvs



In the implementation of scheduler technology, IP load Balancing technology is the most efficient. In the existing IP load balancing technology, a set of servers is made up of a high-performance, highly available virtual server through Network address translation (translation), which we call Vs/nat technology (virtual server via Network Address translation), most commercially available IP load Balancer Scheduler products Use this method, such as Cisco's LocalDirector, F5 big/ip, and Alteon acedirector. On the basis of analyzing the disadvantage of vs/nat and the asymmetry of network service, we propose the method of implementing virtual server through IP tunneling Vs/tun (virtual server via IP tunneling), and the method of implementing the dummy server through direct routing vs/ DR (Virtual Server via Direct Routing), which can greatly improve the scalability of the system. So, the Ipvs software implements these three IP load balancing techniques, which are roughly the same as the following (we'll describe in detail how they work in other chapters),


    1. Virtual Server via Network Address translation (Vs/nat)
      Through the network address translation, the scheduler rewrites the target address of the request message, assigns the request to the backend real server according to the preset scheduling algorithm, and the response message of the real server passes through the scheduler, the source address of the message is rewritten and returned to the customer to complete the load scheduling process.

    2. Virtual Server via IP tunneling (Vs/tun)
      When using NAT technology, because the request and response packets must be rewritten by the dispatcher address, the processing power of the scheduler becomes a bottleneck when the customer requests are more and more. To solve this problem, the scheduler forwards the request message through the IP tunnel to the real server, and the real server returns the response directly to the client, so the scheduler only processes the request message. Since the General Network Service response is much larger than the request message, the maximum throughput of the cluster system can be increased by 10 times times with Vs/tun technology.

    3. Virtual Server via Direct Routing (VS/DR)
      The VS/DR sends the request to the real server by overwriting the MAC address of the request message, and the real server returns the response directly to the customer. As with Vs/tun technology, VS/DR technology can greatly improve the scalability of the cluster system. This method does not have the overhead of IP tunneling, and there is no need to support the IP tunneling protocol for real servers in the cluster, but requires that the scheduler and the real server have a NIC attached to the same physical network segment.


For different network service requirements and server configuration, the Ipvs Scheduler implements the following eight load scheduling algorithms:


  1. Call (Round Robin)
  2. The scheduler uses the "round-robin" scheduling algorithm to sequentially allocate external requests to real servers in the cluster, and treats each server equally, regardless of the actual number of connections and system load on the server.

  3. Weighted round call (Weighted Round Robin)
  4. The scheduler uses the "Weighted round call" scheduling algorithm to schedule access requests based on the different processing capabilities of the real server. This ensures that the processing capacity of the server handles more access traffic. The scheduler can automatically inquire about the load of the real server and adjust its weights dynamically.

  5. Minimum link (Least Connections)
  6. The scheduler dynamically dispatches network requests to the server with the fewest number of links established through the "least connection" scheduling algorithm. If the real server of the cluster system has similar system performance, the "Minimum connection" scheduling algorithm can be used to balance the load well.

  7. Weighted least link (Weighted Least Connections)
  8. In the case of the server performance difference in the cluster system, the scheduler uses the "Weighted least link" scheduling algorithm to optimize the load balancing performance, and the server with higher weights will bear a large proportion of active connection load. The scheduler can automatically inquire about the load of the real server and adjust its weights dynamically.

  9. Minimal links based on locality (locality-based Least Connections)
  10. The "least link based on locality" scheduling algorithm is a load balancing target IP address, which is mainly used in cache cluster system. According to the target IP address of the request, the algorithm finds the most recently used server, if the server is available and not overloaded, sends the request to the server, if the server does not exist, or if the server is overloaded and has half of the workload of the server, the principle of "least link" is used to select an available server. , the request is sent to the server.

  11. Local least-link with replication (locality-based Least Connections with Replication)
  12. The "least local link with replication" Scheduling algorithm is also a load balancer for the target IP address, which is mainly used in the cache cluster system. It differs from the LBLC algorithm in that it maintains a mapping from a destination IP address to a set of servers, while the LBLC algorithm maintains a mapping from a destination IP address to a server. According to the target IP address of the request, the algorithm finds the corresponding server group of the target IP address, selects a server from the server group according to the principle of "minimum connection", if the server is not overloaded, sends the request to the server, if the server is overloaded, select a server from this cluster according to the "minimum connection" principle. Join the server to the server group and send the request to the server. Also, when the server group has not been modified for some time, the busiest server is removed from the server group to reduce the degree of replication.

  13. Destination Address hash (Destination Hashing)
  14. The "Target address hash" scheduling algorithm finds the corresponding server from a statically allocated hash list, based on the requested destination IP address, as a hash key (hash key), if the server is available and not overloaded, sends the request to the server, otherwise returns NULL.

  15. Source Address hash (source Hashing)
  16. The "Source address hash" scheduling algorithm, based on the requested source IP address, as the hash key (hash key) from the static distribution of the hash list to find the corresponding server, if the server is available and not overloaded, send the request to the server, otherwise return empty.


3.2 Core Layer-7 Switch Ktcpvs



In the IP load scheduling technology, when the initial SYN packet of a TCP connection arrives, the scheduler chooses a server and forwards the message to it. Thereafter, the IP and TCP header addresses of the sender are checked to ensure that the subsequent messages of this connection are forwarded to the server. In this way, Ipvs cannot check the requested content and then select the server, which requires the backend server group to provide the same service, regardless of which server the request is sent, and returns the same result. However, in some applications, back-end servers are functionally different, some provide HTML documents, some provide images, and some provide CGI, which requires content-based scheduling (content-based scheduling).



Due to the overhead of the user space TCP gateway, we propose to implement the Layer-7 Exchange method in the kernel of the operating system to avoid the switching of user space and core space and the overhead of memory duplication. In the kernel of the Linux operating system, we implemented the LAYER-7 exchange, called Ktcpvs (Kernel TCP Virtual Server). At present, Ktcpvs has been able to do content-based scheduling of HTTP requests, but it is not very mature, in its scheduling algorithm and various protocols, such as the function of support, there is a lot of work to do.



Although the application layer switching process is complex, its scalability is limited, but application layer switching brings the following benefits:


    • Requests for the same page are sent to the same server, which can increase the cache hit rate for a single server.
    • Some studies [5] indicate that there is locality in the Web Access stream. The LAYER-7 exchange can take full advantage of the locality of the access, sending requests of the same type to the same server, so that each server receives a better similarity of requests, which can further increase the cache hit rate for a single server.
    • Back-end servers can run different types of services, such as document services, picture services, CGI services, and database services.


4. Features of the LVS cluster



The features of the LVS cluster can be summed up as follows:


  1. Function
  2. There are three kinds of Ipvs software that implement IP load balancing technology and eight kinds of connection scheduling algorithms. In the internal implementation of IPVS, an efficient hash function and garbage collection mechanism are used to correctly handle the ICMP messages associated with the dispatched messages (some commercialized systems can not). There is no limit to the number of virtual service settings, and each virtual service has its own set of servers. It supports persistent virtual services such as HTTP cookies and HTTPS, and provides detailed statistics such as the processing rate of the connection and the traffic to the message. Three defense strategies were implemented for large-scale denial of service (deny) attacks.
    There is an application-layer switching software Ktcpvs based on content request distribution, which is also implemented in the Linux kernel. With the relevant cluster management software to monitor the resources, can timely fault shielding to achieve high availability of the system. The master, Slave scheduler can periodically synchronize state to achieve higher availability.

  3. Applicability
  4. The backend server can run any TCP/IP-enabled operating system, including Linux, various Unix (such as FreeBSD, Sun Solaris, HP UNIX, etc.), Mac/os and Windows nt/2000, and so on.
    The load scheduler can support the vast majority of TCP and UDP protocols:

    Agreement Content
    Tcp HTTP,FTP,PROXY,SMTP,POP3,IMAP4,DNS,LDAP,HTTPS,SSMTP, etc.
    Udp DNS,NTP,ICP, video, audio stream play protocol, etc.
    You do not need to make any modifications to the client and server to apply most Internet services.

  5. Performance
  6. The LVS server cluster system has good scalability and can support millions of concurrent connections. Configure the 100M network card, using Vs/tun or VS/DR scheduling technology, the throughput of the cluster system can be as high as 1gbits/s, if the Gigabit network adapter is configured, the maximum throughput of the system is close to 10gbits/s.

  7. Reliability
  8. LVS server cluster software has been well applied in many large and critical sites, so its reliability is well proven in real-world applications. There are many schedulers that have been running for more than a year and have not been restarted.

  9. Software license
  10. The LVS cluster software is a free software issued under the GPL (GNU public License) license, which means that you can obtain the source code of the software, and you have the right to modify it, but you must ensure that your modifications are also distributed in GPL form.


5. Application of LVS Cluster



LVS project from the establishment to now, received a lot of attention, LVS cluster system has been applied to a lot of heavy load site, as far as I know the system has been in the United States, Britain, Germany, Australia and other countries in the dozens of sites officially used.



We do not have hundreds of machines and high-speed network to actually test the ultimate performance of LVS, so the application example of LVS to illustrate the performance and stability of LVS. Some examples of large LVS applications we know are as follows:


  • The UK National Janet Cache Service (Wwwcache.ja.net) provides Web cache services for more than 150 universities in the UK. They used 28 nodes of the LVS cluster instead of the original now more than 50 independent cache server, with their words now the same speed as summer, because summer is not a lot of people use the network during the holiday.
  • Linux's portal site (www.linux.com) uses LVS to compose many VA Linux SMP servers into high-performance Web services that have been in use for almost a year.
  • SourceForge (SourceForge.net) is a global provider of web, FTP, mailing list and CVS services for developing source projects, and they also use LVS to dispatch workloads to more than 10 of machines.
  • One of the world's largest PC manufacturers employs two LVS cluster systems, one in the Americas and one in Europe, for online direct marketing systems.
  • The real Company (www.real.com), known for providing audio and video services in RealPlayer, uses a LVS cluster of 20 servers to provide audio and video services to its global users. In March 2000, the entire cluster system received an average of 20,000 connected request flows per second.

  • Netwalk (www.netwalk.com) constructs the LVS system with multiple servers, providing 1024 virtual services, one of which is an American mirror site (www.us.linuxvirtualserver.org) for this project.
  • RedHat (www.redhat.com) already contains LVS code from its 6.1 release, and they developed a LVS cluster management tool called Piranha, which controls the LVS cluster and provides a graphical configuration interface.

  • VA Linux (www.valinux.com) provides customers with the LVS-based server cluster system and provides related services and support.
  • Turbolinux's "World-class Linux cluster Product" Turbocluster was actually based on LVS ideas and code, but they forgot to thank them for their press releases and product demonstrations.
  • Red Flag Linux and mid-soft both provide LVS-based clustering solutions, and are presented in the September 2000 Linux World 2000.


Here, we refer to the comments of two LVS users to further illustrate the performance and reliability of LVS.

"We tried virtually all of the commercial load balancers, LVS beats them all for reliability, cost, manageability, You-nam  E-it. "-jerry glomph Black, Director, Internet & Technical Operations, Real Networks, Se attle Washington, USA
Http://archive.linuxvirtualserver.org/html/lvs-users/2000-03/msg00180.html 
http://marc.theaimsgroup.com/?1=linux-virtual-server&m=95385809030794&w=2
"I can say without a doubt that LVs toasts F5/BIGIP solutions, at least, we are real world implementations.  I wouldn ' t trade a good the LVs box for a Cisco Local Director either. "-drew Streib, information Architect, VA Linux Systems, USA
Http://archive.linuxvirtualserver.org/html/lvs-users/2000-03/msg00178.html 
http://marc.theaimsgroup.com/?1=linux-virtual-server&m=95385694529750&w=2

6. Development and feeling of LVS project



The LVS project, published on the website in May 1998, Ipvs the first version of the original program, and has been encouraged and supported by users and developers from the Internet. It should be said that the initial release of the program is very simple, due to user's use, feedback and expectations, let me feel the value of the work, and constantly time to add new features and improvements to the software, which also get the help of other developers, so the software gradually developed into a more functional, useful system, It's far beyond what I imagined when I started my project. Here, I would like to thank Julian Anastasov for providing a lot of bug fixes and improvements in the system, Dr. Joseph Mack wrote the LVS howto documentation, thanks to some vendors for sponsoring my development (such as hardware, etc.), and sponsoring me to go abroad to do related technical reports.



Currently, the development of the LVS project in progress includes:


    • Expand Ipvs Support for other transport protocols, such as AH (authentication Header) and ESP (Encapsulating Security Payload), so that the IPVS scheduler will implement an IPSec server cluster.

    • Provide a unified, more functional, more flexible LVS cluster management software.

    • Expand and improve the Ktcpvs scheduling algorithm and the support of a variety of protocols, so that the function is more complete and the system more stable.

    • In the aspects of TCP bonding (TCP splicing) and TCP Transfer (TCP Handoff), some tentative work is done to further improve the application layer scheduling in the LVS cluster.


Finally, I talk about my years of doing free software development several feelings. First, through the free software method can make the software has tenacious vitality; I have done several independent systems, such as the Knowledge Base system developed with Common Lisp on Sun and the object database system based on C + +, some of which are very beautiful (such as meta-level reflection mechanism and relational processing of objects), But for a variety of reasons these software is not published in Open source, and now they are in my mentor's software repository, I have forgotten the implementation details inside, and the LVS project I do play, at first is a very simple program, through the release and development of free software, it developed into a useful, More complete software, embodies the powerful vitality of free software. The second is through the free software market development, can make the initial ideas continue to deepen, can learn a lot of things. Third is to do free software time will be more efficient, due to user feedback and expectations, will consciously constantly improve and improve the system, so there is no time to play games and online chat. Four is to do free software will make you feel a little bit of accomplishment, whenever you receive the user's thanks and think your software running in the actual system, there will be a little satisfaction. So, take action, spend some time doing free software, and you'll have an unexpected harvest.



7. Network resources for LVS projects



If you are interested in the LVS project, please visit the home page of the Linux Vritual Server Project (http://www. linuxvirtualserver.org/or http://www.linux-vs.org/), you can get the latest LVS source code and related running software, as well as the latest documentation.
If you are having trouble using LVS, please subscribe to our mailing list [email protected], ask questions, answer or post your comments.



8. References



[1] Information navigators, Internet growth Charts, http://navigators.com/stats.html
[2] Srinivasan Seetharaman. IP over DWDM. http://www.cis.ohio-state.edu-/~jain/cis788-99/ip_dwdm/
[3] Lucent Technologies. Web Proforum TUTORIAL:DWDM. October 1999,http://www.webproforum.com/acrobat/dwdm.pdf
[4] Lucent Technologies. Lucent Technologies announces record-breaking 320-channel optical networking system. April, http://www.lucent.com/press/0400/000417.nsb.html
[5] Yahoo! inc., the Yahoo! Directory and Web Services, http://www.yahoo.com/
[6] Dell Inc. http://www.dell.com/



9. About the author
Dr. Zhangwensong, developer of the open source and Linux kernel, the founder and major developer of the famous Linux cluster Project--lvs (Linux Virtual Server). He currently works in the National Key Laboratory of parallel and distributed processing, mainly engaged in cluster technology, operating system, object storage and database research. He has been spending a lot of time on the development of free software and has been happy with it.



Linux Server cluster System (i) (EXT)


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.