Site high-availability architecture--Two

Last Update:2016-05-11 Source: Internet

Author: User

Tags failover

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Reusable service modules provide the basic public services for business products, which are usually isolated and distributed, and are used for remote invocation by specific applications. Reusable services, like applications, are stateless services that enable highly available services using a failover policy similar to load balancing. In addition to this, there are several highly available service strategies in the specific practice:

1. Grading Management

Operational management of the server, the core applications and services priority to use better hardware, operational response speed is also extremely rapid. At the same time, the service deployment also carried out the necessary isolation, to avoid the failure chain reaction. Low-priority services are isolated by starting different threads or deploying on different virtual machines, while higher-priority services need to be deployed on different physical machines, and core services or data even need to be deployed in data centers in different geographies.

2. Timeout settings

3. Asynchronous invocation

The application of the call to the service is done asynchronously, such as Message Queuing, to avoid a failure of a service to cause the entire application request to fail.

4. Service downgrade

During peak site visits, the service may degrade due to a large number of concurrent calls, which can lead to service outages when critical. In order to ensure the normal operation of core applications and functions, the service needs to be degraded. There are two ways to downgrade: Denial of service and service shutdown.

Denial of service: Rejects calls from low-priority applications, reduces the number of concurrent service calls, and ensures that core applications are used properly. Or randomly deny partial request invocation.

Shutdown feature: Turn off some unimportant services, or shut down some unimportant features within the service to conserve system overhead and yield resources for important services and functions.

high-availability data

The means of ensuring high availability of data storage are mainly data backup and failure transfer mechanism. Data backup is to ensure that the data have multiple copies, the failure of any copy will not result in permanent loss of data, so as to achieve full data persistence. The fail-over mechanism ensures that when a data copy is inaccessible, it can quickly switch to other copies of the data to ensure the system is available.

Data persistence: Ensure that data is stored durably and that there are no data loss issues in all situations.

Data accessibility:

Data consistency

1.CAP principle

In particular, data consistency can be divided into the following types:

2. Data backup

The advantages of cold are simple and inexpensive, with low cost and technical difficulty. The disadvantage is that data eventual consistency is not guaranteed and data availability is not guaranteed, and it takes a long time to recover the data from the cold standby storage, and the data is inaccessible during this time and the system is not available.

Data hot-standby can be divided into two kinds: asynchronous hot standby and synchronous hot standby mode.

In the asynchronous write mode, the storage server is divided into the primary storage server (master) and from the storage server (Slave), the application normally only connects to the main storage server, when the data is written, the primary storage server's write Agent module writes the data to the local storage system, immediately returns the write operation successfully responds, The write operation data is then synchronized to the slave storage server through an asynchronous thread.

The hot-standby mechanism of relational database is the usual master-slave synchronization mechanism. The Master-slave mechanism not only solves the backup problem, but also improves the database performance problem.

3. Fail-Over transfer

If any server in the data service cluster goes down, all the read and write operations of the application against this server need to be rerouted to the other servers to ensure that data access is not invalidated, a process called failover. The failure operation has three parts: failure confirmation, access transfer, data recovery.

3.1 Failure Confirmation

Determining the server outage is the first step in the system's failover, and there are two ways to verify that a server is down: heartbeat detection and application access failure reporting.

For an application's access failure report, the control center also needs to send a heartbeat test again to confirm that the server is down, because once the data access fails over, it means that the data store has multiple copies that are inconsistent and requires a series of complex operations.

3.2 Access Transfer

When you confirm that a server is down, you need to route the data read-write access center to a different server. For a fully-peer storage server, when one of the outages occurs, the application switches directly to the peer server based on the configuration. If the storage is not equal, then you need to reroute the calculation and select the storage server.

3.3 Data Recovery

Because the server is down, so the data storage copy is reduced, the number of copies must be restored to the value set by the system, otherwise, if there is a server outage, the system can not access the transfer, the data permanent loss situation.

Software quality assurance for highly available websites

Web site in order to ensure the availability of online systems need to take some with the traditional software development of different quality assurance means.

Website Publishing: The site publishing process is in fact comparable to server downtime, and its impact on system availability is similar to server downtime. But Web publishing is a pre-predicted server outage, so the selection process is softer and less impact on users. Publishing scripts are typically used to complete a publication.

Automated testing

With release validation

When the website is published, it is not to publish the test pass code package directly to the online server, but first published to the pre-release machine, the development engineer and the test engineer on the pre-release server and release verification, verify that the system is not a problem before the official release.

Code control

Automated Publishing

Grayscale Publishing

Large sites will use the grayscale publishing model, dividing the cluster server into several parts, only a subset of servers per day, observing the stability of the operation is not a failure, and then continue to publish a portion of the server, for a period of time to complete the entire cluster release, if you find the problem, only need to roll back the released part of the server.

Site high-availability architecture--Two

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More