Our business systems, whether internal enterprise systems or Internet application systems, require scalable and highly available systems. Scalability and high availability are not isolated. Only by combining them can we achieve the desired results.
Scalability is an optional attribute of a system, network, or process. It expresses the meaning that it can handle increasing work in an elegant way, or expand it in a clear way. For example, it can be used to indicate that the system has the ability to increase throughput with the increase of resources (typical hardware.
Vertical Scaling refers to adding resources to a single node in the system, typically adding CPU or memory to the machine. Vertical Scaling provides more shared resources for the operating system and application modules, therefore, it makes the virtualization technology (which should be used to run multiple virtual machines on one machine) more effective.
Horizontal scaling means adding more nodes to the system. For example, adding new machines to a distributed software system. A clearer example is to add three web servers. As computer prices continue to decrease and performance continues to increase, high-performance computing applications (such as seismic analysis and biological computing) that used to rely on supercomputers (such) now we can use multiple low-cost applications. A cluster composed of hundreds of ordinary machines can achieve the computing power of traditional scientific Computers Based on the RISC processor.
Scalability focuses more on horizontal scalability. When the number of user visits increases rapidly, existing services are not terminated to expand the system capacity. For example, if the web server is no longer accessible to more users, you can add 2nd or more servers without stopping the service, in addition, new servers do not negatively affect existing servers.
There is no way to ensure that the system does not fail 24x7, but users require normal access to the system at any time. This is the requirement for high system reliability. Generally, a service runs on a system or machine. Once a system or machine fails, the user cannot access the service normally; if the same service is divided into two different systems/machines, even if a system fails, the service is still accessible. Another benefit is that the fault recovery pressure is reduced. Now the industry is more inclined to use N 9 to quantify availability. The most common thing is availability like "4 9 (that is, 99.99%. Table 1 is more intuitive.
Description |
Popular Terms |
Availability level |
Annual downtime |
Basic availability |
2 9 |
99% |
87.6 hours |
High Availability |
3 9 |
99.9% |
8.8 hours |
Availability with automatic fault recovery capability |
4 9 |
99.99% |
53 minutes |
High Availability |
5 9 |
99.999% |
5 minutes |
According to Murphy's Law ["anything that can go wrong will go wrong, there are no 100% reliable Web sites in the world (unless not running ).
There are two possible solutions: one is for upgrading a single service, but the upgrade process is complicated, the cost is high, and it is often for a single fault point; the other is to upgrade the server cluster, that is, to set up an effective structure of the network service. The benefits of doing so are: by enhancing the redundancy of the cluster system, it achieves high availability, achieves high performance and high throughput through divide-and-conquer, and achieves high scalability and high performance/price ratio through dynamic adjustment of nodes.
The most common technology here is the cluster. Server Load balancer is commonly used in cluster technology. Server Load balancer technology has clusters and NLB on Windows, LVS on Linux, and solutions provided by third parties, such as F5 Server Load balancer.
The application systems we developed run on the Windows platform, with poor reliability and stability. Although Windows occupies the vast majority of the desktop market, it still has a small share in the server field: organizations with tens of thousands of server applications, such as Google, Yahoo, Tencent, and Baidu, choose Linux as the operating platform to support huge business access. Every service runs on a system or machine. Once a system or machine fails, the service will inevitably stop. If any server fails, the Service Running above will no longer ask users to provide valid services.
NLB is a cheaper cluster solution on Windows. NLB has been used to build a cluster. For more information, see the implementation of Windows Server 2003 network load balancing. Linux's cheap cluster solution is LVS, which has many advantages over NLB. LVS can make this work easier (to build highly scalable network services, moreover, LVS has proved to be extremely stable and is being deployed by more and more sites and systems. For more information about LVS, see the Chinese documentation of the LVS project.
1. Implemented in the Linux kernel. The 2.6 kernel has been integrated with ipvs Kernel patches, so you do not need to re-compile the kernel .;
2. Three IP Server Load balancer technologies (implementing virtual servers through network address translation, implementing virtual servers through IP tunneling, and implementing virtual servers through direct routing );
3. Ten Load Scheduling Algorithms;
4. IPv4 and IPv6 are supported.
For the high availability of the system, you can build a web cluster, cache cluster, mail cluster, media cluster, DNS cluster, and MySQL cluster. In addition, the hardware platform of LVS:
- Any hardware platform running Linux can run LVS;
- LVS has low requirements on CPU speed for load balancing and packet forwarding;
- LVS can be run on a low-power hardware platform, such as Intel Atom CPU 1.6 GHz. The power consumption is 2 ~ 3 W, and Gigabit Ethernet.
In traditional mode, after a user's access request is parsed by the DNS server, the service request is forwarded to the Web server, and the data is obtained and returned to the user. This mode has two troubles: After the number of users simultaneously accessing the server reaches a certain level, the server cannot provide the required normal access; When a fault occurs, all access requests will fail. To solve such a problem, LVS is the top choice. When we adopt the LVS scheme, we will change the record of the DNS server so that the user's access will first reach the server where the LVS controller is located, and LVS will forward the request to the Real Server following a certain algorithm. What is the process of data return? In the form of a cluster using the Dr mode, the real server directly returns the data to the user without going through the LVS controller. This design simplifies the structure and reduces the pressure on the LVS controller.
LVS/DR consists of two parts: controller and Real Server. You must configure the controller and real server to provide normal services. The two most important things for implementing LVS/DR are the ipvs kernel module and ipvsadm toolkit. Fortunately, the current release version already contains the ipvs kernel module, you no longer need to install the patch like the old kernel version. ipvsadm needs to be downloaded and installed from the Internet.
LVS can forward service requests to a variety of operating systems. In Windows, it is more difficult to set the subnet mask to 255.255.255.255. The only way to modify the registry is to set four 255 masks for the TCP/IP attribute of the local connection of the network neighbor. By default, Windows does not have a loopback interface. You must install this "device" before configuration ".