Linux cluster and system expansion basics

Source: Internet
Author: User
Tags serialization

1 , what is a server cluster

In a nutshell, a collection of servers that combine multiple servers together to achieve specific functionality for a particular purpose , that is, server clusters, which are all Linux -based systems, we can call this cluster a Linux cluster.

2 , why use a cluster

For example, now that you have a server, deploy this server as LAMP or lnmp, and then set up a site, has just started to have a connection of more than four concurrent access, the server can provide normal user service within 3 seconds, when the connection grew to Servers can provide services within 5 seconds, relatively, the longer the server response time, the user experience will be reduced, when the site for more than 3 seconds to open, may be lost 40% users, more than ten seconds, Users who may be losing more than 75%. If the server is responding at this time more than 5 seconds, and CPU load, bandwidth resources is very tight, then need to expand the performance of the server to ensure normal use. Let's talk about how to expand.

3 ,cluser

(1) Scale up expansion (vertical expansion), with better performance of the server to replace the existing poor performance of the server, but the same proportion of capital investment does not bring the same proportion of performance improvement, often the higher the performance, the greater the cost, is a low cost-effective solution;

(2) Scale out outward (horizontal), by increasing the number of similar or similar level of server to gain such as capacity, and provide a consistent service, when the user requests, by one of these servers to provide services. This method can achieve the same performance of the same investment promotion, but bring the management of the problem: whether the system is consistent, network topology, System selection, coupling degree, resource sharing and so on.

since you want to use a scale-out solution, you need more than two servers to provide services, then the connection access to which server needs to have a unified distributor to allocate resources, as far as possible, as fair a number of user requests according to a certain number of rules on average scheduling multiple servers, this model is similar to DNS in the same domain name has multiple a records, corresponding to multiple IP addresses, so that can poll access, but here the allocation method has some rules to achieve. Therefore, in the front of these servers need a scheduler, where the scheduler is the front-end to receive user requests, the backend dispatched to each server, The scheduler is the load balancer (Director | Dispatcher | Loader balancer).

scale-out can lead to increased capacity addition, but there is a certain limit, if these servers compete for some shared resources, there may be hot spots, so as the number of servers increased, the internal will become more and more chaotic, performance as the number of servers will decrease more and more serious once exceeded a certain limit, It can be understood that the2-3 person is called a team,200-300 people can only call the mob.

The creation of a cluster may also have other uses, such as a server in the enterprise to provide a service, the server will inevitably have the probability of downtime, once the outage caused business disruption, but if it is composed of two or more servers to form a cluster to provide the same services, regardless of which one of the outage, the business is still available, Here you can think of RAID (disk array) can understand (can be considered a cluster of hard disk), although the probability of multiple server outages is increased, but can cause business disruption while the probability of downtime is smaller, which guarantees the availability of services, This combination of clusters is either a highly available cluster or a cluster to improve availability.

For example: In the case of a cluster of two servers, it is possible to use either the main standby model (Active-standby model) or the dual master model (Active-actice model).

a, B host to provide all services, and b The host continues to detect A host's Heartbeat Information (detection interval is determined by a combination of other factors), if the a B the host will A services provided by the host, Ip address, data resources, etc. switch to B and the front-end routing distribution requests are all distributed to B host, At this point the user's access may have a short interval, but after a successful exchange, all services return to normal. When the a After the host is repaired, you can replace the b b< Span style= "font-family: ' The song Body '; > host as the main machine, a host as a backup machine.

several features of the measurement system :

(1), scalability (scalability) need to increase resources to complete more work tasks can be achieved by the equivalent cost-effective promotion;

(2), availability: Can be understood as the available time ratio, such as: the availability of a system in addition to 0 is 1, the availability of a year to reach 99%, thatis , 4 days is completely unavailable, If reached 99.9%, one year downtime less than half a day, now can reach the highest availability is probably 99.999%, that is, about 5 minutes, but to achieve such a percentage of the consumption of manpower, material resources also invested more. ;

(3), Performance: response time;

(4), capacity: the throughput that can be achieved in the case of acceptable performance;

(5), maximum throughput: The maximum value of the data indicator system capacity as measured by the benchmark performance test.

Typically, scaling can lead to performance gains, but often not as a response scale boost.

as operations personnel, the most basic goal is to be available, but also the most important point, even if the performance, capacity, expansion is not very good, but as long as available, do not arbitrarily change, stability is greater than all. System stability can be more important than the design of how perfect, perfect is also relative, there is always a more perfect design scheme, so to ensure that its availability is the most fundamental. On the basis of stability, if you want to expand, reduce the installation of many host OS, System configuration, you need to standardize the operation and maintenance. For example, the software version (such as: Operating system), application version, the program configuration path to unify, thus reducing unnecessary trouble. On the basis of stability, operation and maintenance work has been carried out, the next step can be automated, such as: the construction of automated operation and maintenance systems, such as the emergence of common problems, the platform can be identified by default and the first time to solve.

then how to deal with the new system on-line? such as bat3m such a company is not possible to stop maintenance of the whole station, at this time need to use grayscale publishing, such as a large number of visits, there are more than five units, on average, in the small number of visits in the early morning, can be Host can provide services normally, so that the front-end scheduler can be offline on the host, install a new version, test on-line, and then offline , install a new version, and then go online, until all the update is complete, this way is called Grayscale publishing. If this artificial grayscale publishing is turned into an automated grayscale release, the new version is given to the platform, and the platform is released according to this ladder order. This will use less manpower to manage more servers.

4 ,Linux cluser type

Load Balancer cluster (load Balancing short:LB): The main goal is to maintain more user requests by adding more hosts, primarily for capacity expansion and availability, but availability is not a primary purpose at this point; , because the host is more than one, can provide a better available row, but the scheduler is unique, there is a risk of single points of failure (SPOF: one point offailure), once the scheduler is down, all services are not available, Therefore, the following cluster is quoted;

Highly available cluster (highavailability abbreviation:HA): As you can imagine, since the scheduler has a single point of failure, you can add a host to the scheduler layer to implement the Master and standby model of the cluster, thereby eliminating the single point of failure, Similar to this type of cluster that can improve availability is called a highly available cluster. The goal of a highly available cluster is not to scale up and improve performance, but primarily to enable services to remain available for an acceptable timeframe. Measurement formula:a= average no fault time/(mean time to failure + mean time to repair), where the average failure time of the proportion to measure the availability of the server, generally reach 99.9% enough;

high-performance cluster ( higher performance hp< Span style= "font-family: ' The song Body '; ): To organize a large number of hosts to work together to complete a larger and more complex computing cluster, called high-performance cluster, (the world's fastest 500 a computer can be queried on this site: Www.top500.org

Distributed systems: such as single MySQL server may have a lot of high concurrency read and write operations, a MySQL server can not carry such a large amount of traffic, but the use of MySQL server read load balancing is possible, but write load balancing is not practical, At this point, the data can be cut, if it is two servers,the data, then you can split each of the three, these servers together can be combined into a complete system. So this reduces the server pressure, but not for the same purpose and organized together the cluster is called a distributed system, and this distributed system often need a scheduler (or called query router), to provide users with a unified portal, Access to the previous data or after each of the data is dispatched to two servers. Many of today 's NOSQL are working in distributed mode. Similarly, distributed file systems are implemented in a similar way.

5 , highly scalable systems

As an OPS engineer, it is necessary to design an architecture that elastically expands the shrinkage, so building a highly scalable system follows the following principles:

in the system to avoid serialization and interaction; (Interaction: The system and the system need to communicate with each other, such as the e-commerce system of the shopping cart items are in the first server, if two servers do not communicate, then access to the second server will not appear in this shopping cart items, So in order to realize each landing can see this shopping cart of things need two server communication to achieve synchronization, in fact, is the internal shared cluster; Serialization: The internal components of the system run in sequence, have dependencies.

The basic of these, if there is something wrong, look at correct.


Linux cluster and system expansion basics

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.