Research on high availability schemes under Linux _unix Linux

Source: Internet
Author: User

  Ensuring continuous and stable system uptime is becoming more and more important, and the traditional small-scale machine system is prohibitive for ordinary users. Users need higher availability and lower costs. High Availability (HA) technology automatically detects server nodes and service process errors, failures, and automatically reconfigure the system when this occurs, enabling other nodes in the cluster to automate these services to achieve uninterrupted service. Cluster applications can be divided into three areas: High-availability (HA) (High availability cluster), load Balance (load Balancing cluster), Scientific (scientific cluster). Mixing and mixed often occur between these three basic types of clusters. As a result, a high-availability cluster can also be found to balance the user load among its nodes while still trying to maintain a high degree of availability. Similarly, you can find a parallel cluster from the cluster that you want to incorporate into your application, which performs load balancing between nodes. This article focuses on the Linux-based HA solution issues. Based on LVs HA scheme Linux to enter the high-end market must have the appropriate measures in this respect, so many companies have increased the research in this area. Now, we can use some of the existing software to build a highly available LVS system. Listed below are two scenarios for your reference. [Scenario]mon+heartbeat+ Fake+coda We can use the "Mon", "Heart Beat", "fake", and "coda" four software to build a virtual server with high availability. "Mon" is a popular resource management system used to monitor server nodes and network services on the network. The "Heartbeat" Implementation transmits "heartbeat information" between two computers by using the UDP protocol on the serial line. "Fake" is a method of using ARP spoofing to implement IP takeover. When a server fails, the process is as follows: the "Mon" process runs on the load balancer and is responsible for monitoring the server nodes and service processes across the cluster. Write to detect the server node in the configuration file "Fping.monitor", and then the "Mon" process will check to see if the corresponding server node is still alive in the T-second. Other related service monitors are also configured so that the "Mon" process detects the corresponding service process for all nodes per m second. For example: http.monitor: Used to configure the monitoring HTTP service; Ftp.monitor: For configuring the Monitoring FTP service; When the configuration is complete, a server node fails or becomes active, the service process becomes invalid, or it is back in effect, and a notification message is sent, so the load balancer can know if the server node can accept the service. Now, the load balancer becomesThe single point of failure of the entire system. To prevent this, we must install a backup server for the load balancer. "Fake" software implementation when the load balancer fails, the backup server automatically takes over the IP address and continues to serve. "Heartbeat" automatically activates/shuts down the "fake" process on the backup server at any time, depending on the state of the load balancer. A "heartbeat" process is running on both the load balancer and the backup server, which periodically sends "I ' m Alive" messages through the serial line. If the backup server does not receive "I ' m Alive" information from the load balancer within a predetermined time, the "fake" process is automatically activated to take over the IP address of the load balancer and to start providing load balancing services, and when the "I ' m Alive" message from the load balancer is received again, The backup server automatically shuts down the "fake" process, releasing the server it takes over, and the load balancer starts working again. However, if the load balancer fails when the customer is requesting it, the customer request will fail and the customer must issue the request information again. "Coda" is a fault-tolerant Distributed file system, originating from the Andrew file system. Directories on the server can be stored on "coda", so files are highly available and manageable. [Scenario two]ldirectord+heartbeat "Ldirectord" (Linux Director Daemon) is an independent process implemented by Jacob Rief programming to monitor services and physical servers, Widely used in HTTP and HTTPS services. "Ldirectord" Installation is simple, can very good with "heartbeat" work together. The "Ldirectord" program is included in the "contrib" directory in the "Ipvs" package. The following are some of the advantages of "Ldirectord": "Ldirectord" is a specially written LVS monitoring program. It reads all configuration information about the Ipvs routing table from the/etc/ha.d/xxx.cf file. When "Ldirectord" is running, the Ipvs routing table will be appropriately configured. The virtual service configuration can be placed in multiple configuration files, so you can modify the parameters of one service individually without affecting other services. "Ldirectord" can be easily managed by "Heartbeat"----startup, shutdown. Place "Ldirectord" in the/etc/ha.d/resource.d/directory and add a row to the/etc/ha.d/haresources: Node1 ipaddr::10.0.0.3ldirectord::www LdirectOrd::mail "Ldirectord" can be manually turned on and off. It can be used in a LVS cluster without a backup load balancer. Xlinux's latch ha scheme as mentioned earlier, high availability Solutions (HA) are extremely important and many vendors have invested a lot of research into this. The Xlinux release provides a latch ha solution. Now let's take a look at the latch ha scheme. The most typical system structure of the LATCH ha solution: Two hosts A, B share a disk array, A is the working machine, B is the backup machine. They are connected by a heartbeat line, called Heartbeat detection, which is done primarily through a RS232 detection link. LATCH Ha also uses a ping to verify system downtime. The HA software installed on the host monitors the operation status of the other in real time through the heartbeat line, and Host B is put into operation as soon as the host a is working because of a variety of hardware failures that cause the system to fail. How about, and IBM's hacmp a bit like it! LATCH ha implements the "High reliability shared storage" architecture. The architecture consists of two or three redundant servers, a shared redundant disk array, an optional DBMS, and latch ha system software. Under the protection of latch ha, Enterprise computer systems can provide uninterrupted information services to avoid downtime due to hardware failure or routine maintenance, thus ensuring optimal reliability and minimizing downtime. Scenario Application LATCH ha can be applied in a variety of centralized, client/server or OLTP systems. It is also compatible with the various mainstream database systems in the market and OLTP software such as Oracle, SYBASE, Informix, Tuxedo, and so on. LATCH HA provides a variety of application interfaces at the same time. Therefore, customers can integrate various functions into their proprietary software to ensure the high reliability of the system. LATCH ha/hs2000 Online Standby mode in this mode, a server acts as the primary server. Under normal circumstances, it undertakes all services. The other server acts as a standby server (normally, except for monitoring the state of the primary server and doing nothing else). Once the primary server is down, the standby server takes over and becomes the new primary server. Customers can still have the same server IP address, NFS, data, database, and other ... This application pattern is similar to the typical application model described above (two servers are actually performing the same function), and the HA software installed on the host monitors the operation status of each other in real time through the heartbeat line, once the working host A is due to various hardware failures, such as power failure, Failure of the main parts or failure of the boot disk caused the system to fail, and Host B immediately put into operation. LATCH ha/da2000 Dual-Machine Ready modeIn this mode, two hosts are the main server, sharing their own disk array, each bear part of the service. For example: Server A in the implementation of application A, Server B in the implementation of B, two hosts under normal conditions of their own independent operation of the application logic, two hosts at the same time as each other's standby server, through the heartbeat line monitoring the other party's status. Once a server is down, the other server assumes all the services and serves all customers. Once server A fails, Server B immediately takes over the original application on server A, or Server B fails, and server A takes over the original application on Server B, which is a redundant pattern. It is obvious that once a server is down, the workload of another server is heavier, so there is a three-host mode. LATCH ha/hc2000 Three-host mode this application mode is the highest-end HA application mode, which guarantees the system redundancy, avoids the system downtime, and guarantees sufficient system resources to be used once the downtime occurs. In this mode, standby server C also monitors the state of the primary server A and B. Once Server A or B is down, Server C will assume its services for customer service. This kind of system structure not only guarantees the safe operation of the system, but also guarantees the system resources. Linux ha solutions are certainly not limited to the above two, but the core idea is consistent, that is, providing uninterrupted service. In recent years, as the Linux operating system continues to mature, features continue to enhance, especially its adherence to the GPL and standardized PVM, MPI messaging mechanism and the performance of the common PC in the better High-performance network support, all these for the development of linux-based cluster system provides a solid technical foundation , the high-end HA applications stand out in their stable and reliable performance and price advantages over UNIX in the process of translating technology into concrete applications. With the Intel platform based server has become a key business and application of the mainstream server, Linux HA cluster technology application will be increasingly extensive.


HA cluster structure diagram ha is actually two (or more) computers listening to each other in a certain way to achieve hot backup. When there is a problem with the primary server, Standby server automatically takes over the job immediately, causing the user to feel no downtime. After the primary server is restored to normal, Standby server also returns the work to primary server. (Source: Sadie net)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.