Architect Express 8.3-availability

Source: Internet
Author: User

As a software system usability is the first, if a system is not available, you do the rest of the place again how good, and then egg.

Generally what happens when the software is unavailable:

Our failure, resulting in the system is not available, of course, there will be a single machine is not available and N multi-group of all is not available.

    1. Program failure function error, program exit
    2. System failure CPU overload, memory overload, network overload
    3. Physical failure machine crashes power off network
    4. Unrecoverable failure earthquake, tsunami, etc.

The same failure occurs in the client side, causing the system to be unavailable and, of course, the unavailability of individual users and the availability of regional users.

For our problems, we must solve the problem through the structure, for the customer's problems, we try to find ways to solve the problem, solve the regional problems, and then solve the individual user problems. The solution has to take into account the cost and strategy to make trade-offs, such as early in the startup, there is no large amount of money, to solve the unrecoverable failure is basically unlikely.

We first try to solve our failures from an architectural approach, which is similar to a design pattern and is called an architectural pattern.

For single-machine unavailability, there is a professional term called single point of failure, the best way is to deploy multiple machines, through multi-machine load balancing, to avoid single point of failure.

    1. Distributed
    2. Load Balancing

For multi-machine unavailability, we need to classify how to solve:

    1. Program failure function error, program exit, this error has classmate said, can add unit test, functional test, let the test to find the problem. Yes, but that's the development process, we're not going to talk about that, we're talking from an architectural perspective, the main solution is as follows:
      • Grayscale publishing
      • exception monitoring
    2. system failure CPU overload, memory overload, network overload
      • Flow control
      • function downgrade
      • exception monitoring
    3. physical fault   machine crash power off network
      • offsite Live
      • hot spare or cold
      • geo-Data synchronization
    4. Unrecoverable failure earthquake, tsunami, etc.
      • ditto

I will give you a detailed explanation of each topic in the following.

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Architect Express 8.3-availability

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.