Failure hypothesis of the blog Park and Alibaba Cloud: high demand and low configuration (supplemented by the frequency reduction theory)

Source: Internet
Author: User

Background:

Since logging on to Alibaba Cloud in the blog Park, frequent faults have led everyone to read fault reports every week and establish a profound friendship with 503.
Faults ranging from hard disk IO to Server Load balancer to application-level faults are both suspected and detected, but the problem persists.
The solution is that the blog Park keeps buying high-end configurations, but it still cannot escape the 503 magic cup.
In the end, the blog garden points to the CPU in Alibaba Cloud.
Alibaba Cloud also suspected that it was a program problem in the blog Park.
In this case, I am concerned about this issue.

 

Note: The following content is purely personal obscenity and hypothetical. It does not mean that it meets the facts and is only for your reference:

 

Because Alibaba Cloud and the blog Park have different ideas and mutual restraint, I have to give an alternative guess that there is a problem or no problem.

 

The end hypothesis is: the low configuration of Alibaba Cloud cannot meet the high requirements of the blog Park.

 

I. Assume that the blog is a high-demand program:

We assume that the program in the blog garden may be mixed with multiple fashion components, but cannot master the principles and core, and thus cannot be used in optimization;
As a result, the transition depends on the cache, while the native program runs for a long time on average. As a result, the out-of-Cache part cannot load a large number of concurrent jobs, causing heavy System Burden and requiring high configuration.

 

II. If Alibaba Cloud does not provide high Configuration:

First of all, we should assume that Alibaba Cloud's cloud products, virtualization technology has not exceeded the industry's leading Xen.

Then, let's take a look at the following information (from the Internet) to understand two concepts: CPU and VCPU:

 

1: Physical CPU and virtual VCPU

When the xen client is started, the virtual CPU is determined by the dom0 system and fixed on a physical CPU core. This allocation is random. For example, our machine has two dual-core processors, that is to say, there are 4 cpu cores. At the same time, we allocate 4 cores to our virtual client, so we can also see 4 cpu cores on the client, however, these four vcpu cores do not really correspond to the four cores on the physical machine. The four vcpus may correspond to one ~ 4 cores, that is, there are four possibilities:
4 vcpu = 1 CPU
4 vcpu = 2 CPU
4 vcpu = 3 CPU
4 vcpu = 4 CPU
Vcpu refers to the virtual CPU Core
Cpu refers to the physical CPU Core
Because of this, when we run computing-intensive tasks on our virtual client, we must look at the correspondence between our vcpu and physical CPU, if necessary, manually fix the VCPU to the physical CPU so that the virtual machine can use all the physical cores. If you are running IO-intensive tasks, it is better to allocate a hyper-threading or the entire core to dom0, and fix other domains so that they cannot use CPU 0.

 

In general, that is to say: although the blog Park buys 8 cores, it is unknown whether it actually corresponds to the 8 cores of the physical machine. The actual situation is <= 8.

Therefore, if the blog Park buys a high-configuration 8-core (VCPU), and actually only allocates four physical CPUs, the performance will be reduced by half and the configuration will be low.

 

So let's assume that the blog Park is lucky:

 

A blog has bought 4 8-core units separately. It is known as a 32-core blog site. According to the above theory, the actual CPU may be (1-32), but it depends on its character.
If four machines are randomly deployed on four physical cores, the loss will be high, and the cost will be low. If the four hosts are deployed on eight cores, the loss will also occur, if it is on 16 cores, the loss is half.
Therefore, this is the first assumption, and the blog Park is running at a low cost.

 

If the blog Park is lucky enough to be allocated to 32 corresponding physical machines, or if Alibaba Cloud attaches great importance to it and helps the blog Park to change the parameter settings, let's look at the following assumptions.

 

Supplementary hypothesis: Alibaba Cloud's CPU reduces the clock speed, or the algorithm average:

 

Let's assume that the 8-core CPU 2.4ghmz clock speed purchased by the blog Park is 300 MHz allocated to each VCPU according to the eight-person standard in Alibaba Cloud, in this way, even if all eight users are fully occupied with a virtual CPU of 100%, the total number of physical resources is exactly 100%.
Therefore, in theory, as long as the frequency of the base is allocated and the number of users is limited, isolation can be achieved.
 
However, the reality is that a low frequency will lead to a reduction in CPU performance and a great waste of resources. Therefore, generally, the IDC will allocate at MHz. This low frequency is generally enough for small sites.

In this way, 4*600 M = 2G is basically isolated for 4 Users, and 8 users are isolated for an average of 50%.

If four people are full, then the remaining four people will definitely be suspended no matter how much they are using. Therefore, through management, they can only shut down the site and can only be removed from this group.
 
Therefore, if Alibaba Cloud lowers the clock speed down, the blog Park will have a low configuration;

 

If the Alibaba Cloud clock speed is set to high or unlimited, it is impossible because the algorithm must ensure the average usage of users.
The problem is what kind of proper clock speed is (generally according to the international standard, it is the 1/4 CPU limit, that is, the standard for 4 people, the actual use is increased to 6-8 people ).

 

According to speculation in the Greater China Environment, the principle of making money is that the basic CPU core is fixed, while the number of users is increasing.
Therefore, you cannot see how many people are sharing the clock speed.

 

 

Based on the above practical assumptions, the blog park itself is running on the downgraded CPU.

 

Hypothesis 2: CPU resource competition

First of all, there is no such thing as a savior, and there is no such thing as absolute isolation of CPU.

It is a common method for customer service to isolate the CPU independently. Just look at the knowledge about the CPU and you will know how to allocate the CPU only by using algorithms.

If you do not believe this, you can refer to the following excerpt (from the Internet ):

 

17:22:20 | category: virtualization-XenServer | font size subscription
Background:
In the Xen environment, the memory and CPU allocation can be dynamically changed. By dynamically changing the memory and CPU allocation, You can optimize the performance of the virtual machine.
Generally, physical cpu resources are automatically allocated to our virtual machines. When multiple virtual machines are allocated to a physical machine, and the total cpu of the Virtual Machine exceeds the actual number of CPUs of the physical machine, and each virtual machine is under high load, high-load virtual opportunity to call resources of other virtual machines
The problem found this time is that a system of our company has launched three VM servers. After going online, we found that the load on the new machines is higher than that on the premise that the traffic is evenly distributed, the load on new machines is around 8, and the load on old machines is around 4. The configuration and parameter settings of new machines are queried. Later, using the iostat command, we found that the steal value of the new ticket was very high, which was about 40, while that of the old machine was about 0.1. After consultation with the boss, the high value of steal indicates the high CPU load of the physical machine. Later, the boss found that the new three VMS were on one physical machine, with each VM allocated four CPUs, while the physical machine was an 8-CPU server, resulting in CPU requisition between the three virtual machines. (Cpu needs to enable hyper-threading !!!)

 

Through the above instructions, there is resource competition between CPUs, and there is a resource competition problem.

Generally, IDC merchants will clear the lease users of VPS that occupy high CPU for a long time, because this will affect other users.

For Alibaba Cloud, it is estimated that there are a large number of users and a small number of personnel in terms of management for users and during the promotion period, so it is similar to letting users use CPU without limit.

Therefore, Alibaba Cloud users are more likely to seize resources.

As a result, the blog garden program is usually good, in some cases the CPU is not good, it may be that the CPU resources are mutually snatched, but the others are only 503.

 

Why can't the blog Park beat others? Here I also have a hypothesis:

 

Virtual Technology has two identifiers to identify the VCPU, namely under and over, and records the average load of each VCPU.
If it is usually high, when it comes to resource competition, the priority will be lower. When it is low, when it comes to resource competition, the priority will be higher.
When I look at the cpu of the blog Park, It is not low at ordinary times, so when it comes to resource competition, there will be no advantage.

 

Therefore, based on the above assumptions:

1: If the blog Park optimizes the program, it will no longer require high requirements;

2: Alibaba Cloud provides a 32-core high-end (all hosts activated for the same user name are allocated by the real-core), and then controls the actual number of users in the allocation;

3: Or Alibaba Cloud users should be conscious and never stick to those CPU-consuming websites;

Maybe...

 

I again reiterate that the above content is purely personal masturbation YY assumption, which may be quite different from the facts. Welcome to the discussion. ..

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.