Large Web site Technology Architecture (v)--site high-availability architecture

Last Update:2014-06-20 Source: Internet

Author: User

Tags failover

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Large Web site technology Architecture (i)--large-scale website architecture evolution

Large Web site technology Architecture (ii)--Architecture mode

Large Web site technology Architecture (iii)--Architecture core elements

Large Web site technology Architecture (iv)--high-performance architecture of the website

The usability of the site (avaliability) describes the features that the site can effectively access.

1, the Site usability Measurement and assessment

Web site unavailable time (failure time) = point in time of failure repair-fault Discovery (report) point in time

Website year Unavailable time = (1-Site unavailable time/year time) x100%

Usability indicators when the site architecture design important indicators, external is a service commitment, internal assessment indicators, specific to each engineer, more is the use of fault points.
The so-called fault points refers to the method of classifying and weighting the fault of the website fault. Here is a case:

Classification	Describe	Weight
Accident-level failure	Serious failure, the overall site is not available	100
Class A failure	Site access is not smooth or core functionality is not available	20
Class B failure	Non-core functionality is not available, or the core features a few users cannot access	5
Class C failure	Other faults	1

The calculation formula for the breakdown is:

Fault score = failure Time (minutes) * Fault weight

2, the site's high-availability architecture

A typical site design typically follows the basic layering model shown.

In the large web site architecture of the load, the granularity is smaller and more detailed, but it is usually possible to divide the servers into these three tiers.

for application-tier servers, typically in order to respond to high concurrent access requests, a set of servers is serviced by a load balancer device, which, when the load balancer detects that a server is unavailable through heartbeat, is raised from the list of clusters and distributes the request to other Available on the server, the entire cluster is saved as available, thus enabling the application to be highly available.

Servers located at the service level are similar to the application tier and are highly available through clustering, except that these servers are accessed by the application layer through a distributed service invocation framework, and the distributed service scheduling framework implements load balancing in the application-tier client.

At the data layer of the server is a special situation, data server storage data, in order to ensure that the data is not lost, the data Access service is not interrupted, you need to write data synchronously replication, data written to multiple servers, to achieve data redundancy backup.
The frequency of the site upgrade is generally very high, every time the site needs to shut down services, restart the system, the equivalent of server downtime. Therefore, the usability architecture of the site also needs to take into account the downtime caused by the site upgrade release.

3, high-availability applications

The application layer mainly deals with the business logic of the website application, also called the business Logic layer, and one of the notable features of the application is the stateless row of the application, so it is relatively simple to implement load balancing.
The context of these multiple requests is called a reply (session) in a Web application, and in a single-machine scenario, the session can be deployed on a Web container on the server for Administration. In a clustered environment that uses load balancing, because the Load Balancer server may distribute the request to any application server on the cluster, it is much more complicated to ensure that the correct session is still available for each request. In the cluster environment, the session management mainly has the following means.

1. Session Copy

Session replication is a kind of server cluster session management mechanism used by early enterprise application system. The application server opens the session copy function of the Web container and synchronizes the session objects between several servers in the cluster, which is the session information of all the users on each server.

Although this scheme is simple, read the session information from the computer is also very fast, but when the cluster size is larger than the server and the site of a large number of resources, in the case of a large number of users access, even if the memory is not enough session usage.

2. Session Binding

Session binding can be implemented using the load-balanced source address hash algorithm, and the Load Balancer server always distributes requests originating from the same IP to the same server. In this way throughout the session, the user all requests are processed on the same day server, that is, the session is bound to a specific server, to ensure that the session can always be obtained on this server, this method has become sticky.

3. Use cookies to record session

A way to manage the session is to record the session on the client, each time the server is requested, the session is placed in the request to the server, the server after processing the request and then the modified session response to the client.

4. Session Server

Session server, that is, the management of the session is deployed on a single machine, the Web server does not save the user session information, each time to the session server to fetch data.

This solution actually separates the state of the application server into a stateless application server and stateful session server. For stateful session servers, a simpler approach is to utilize distributed caches, databases, and so on.

4, high-availability servicesReusable service modules provide basic public services for business products, and these services are usually distributed in a large web site, and are called remotely by specific applications. Reusable services, like applications, are stateless, so you can use a load-balanced failover strategy to effectively serve highly available services. In addition, in practice, there are several high-availability service strategies. 1. Grading Management2. Timeout Settings3. Asynchronous Invocation4, service downgrade, the site peak period, you can close some of the unimportant services, such as comments. 5. Highly Available dataThe main means of ensuring high availability of data storage is data backup and invalidationtransfer mechanism. The cap principle: Data persistence, data accessibility, and data consistency. 6. High-Availability website Quality Assurancehere is the main site publishing process. Look at the picture:

7. Website Operation monitoring" no monitoring system is allowed on-line". Website operation monitoring is essential for the optimization of website operations and architecture design, and operations without monitoring of the site, as if driving a plane without a meter. specific to the monitoring of which data, mainly:
1. User behavior log collection (server side and browser side) 2, server performance monitoring (CPU, memory, etc.) 3. Running data monitoring (cache hit rate, average response delay time, number of messages sent per minute, total number of tasks to be processed, etc. )
after monitoring data acquisition, in addition to the system performance evaluation, cluster scale scalability prediction, can also be based on real-time monitoring data for risk warning, and server failover, automatic load adjustment, maximize the use of all the resources of the cluster machine.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More