On virtual machine Manager (SCVMM) cluster overload state detection algorithm

Source: Internet
Author: User

When we use scvmm2012, we often see that the cluster state becomes

When we look at the properties, we find that it is.

Did you find it? Over-committed, if the translation comes from the resource overload, or the resource overuse, then how does this state appear?

What happens after this state occurs? How to solve?

Today we're going to talk about the over-committed algorithm in SCVMM, knowing how SCVMM is confirming that a cluster is overloaded, knowing how to avoid it, and all the problems that can be solved.

Part 1. Algorithm overview

The overload checking of an SCVMM 2012 cluster is primarily used to confirm that the entire cluster can be restarted on the maximum number of allowed down nodes (this value we have temporarily recorded as R) and that the VMS running on those nodes will be started on the other available nodes, which is configured in: Cluster Reserve (nodes). The cluster first presets the cluster state to overcommitted, and then confirms the cluster state by checking the algorithm operation.

So what's the danger of overloading?

1. A perfect system, there! This symbol looks very uncomfortable, so I'm going to fix it ... Okay, I'm obsessive-compulsive.

1. After overcommited the system, if a node failure occurs, the business may not be normal failover, in other words, it is possible that the important VMS have no resources to start, think of the next, who manages the system who will die very miserable

2. In the dynamic optimization of VMM, because of the overcommited state, dynamic optimization will not work properly ... I'm going to post another article on this later. VMM Dynamic optimization

3 ..... I haven't thought about it for a moment

Part 2. Introduction to Algorithms

A total of four algorithms, as long as any one of the algorithms passed the check, the cluster status is set to "OK", otherwise remain "overcommitted" state. But to illustrate is: overcommited This state, just for the memory, CPU use we are regardless of

Now I'm going to list the four algorithms as follows:

It can be seen that there are two evaluation methods, namely proof and slots, there are two detection strategies, namely simple and full, four algorithms are the combination of strategy evaluation methods 2*2=4 species

Here are the evaluation methods and strategies, but look at the explanation may not be quite clear I write what is meant, you can probably go through, and then look at the following algorithm to explain, and then come back to see, should understand

1.Proof Method

This algorithm evaluates whether the entire cluster meets the following criteria: VMs on the R node that remove the maximum memory overhead have already switched to the other nodes in the cluster, and then the VMS with the maximum overhead memory are not available for the host to switch, which is a worst-case hypothesis

2.Slot Method

By using a maximum VM on the R node as a standard slot, and then calculating the number of slots available for the H node, then evaluating whether the H node can place all slots of the R node

3.Simple Check

The simple check algorithm does not consider the specific node, only makes an assumption on the whole cluster. Select the largest VM in the cluster as the standard (memory or slot). Similarly, when the allowance (slot or amount of memory) is detected, the minimum margin is selected to be counted across all nodes. It is important to note that the maximum VM may be the same node as the lowest slot, but this is not considered when a simple check is made

4.Full Complexity Check

The complex detection algorithm is a combination of each n (r,h) in the exhaustive cluster. When testing, the number of slots, the maximum VM, H-node memory and slot statistics are recalculated according to the specified combination, the biggest risk of this check algorithm is that the N (R,H) composition is likely to become very large, specifically, there are n^r species, in order to avoid such a large number of operations, the algorithm will only n^ R < 5000 is not used.

Here may be a simple list of cluster components as a reference

Part 3. Algorithm details here is a list of predefined value calculation methods, which are used in the following algorithm

Cluster value

Host node value

The values of these variables are calculated on each host and will be calculated in advance to substitute the values in the following calculation formula

There are a few points to note

1. Each VM requires an additional 64MB for the overhead of the Hyper-V hypervisor, but for ease of calculation, the following algorithm does not take this value into account

2. Stopped, saved state, paused, and running VMs are also involved in the calculation.

3. If the virtual machine is using dynamic memory, use the currently assigned value for the formula calculation

Part 4. Algorithm

Finally to the algorithm part, looks as if the computational complexity, in fact, see that the algorithm is not particularly troublesome, the following four kinds of algorithms to illustrate

4.1 Slot Simple

-slotsize = maximum memory value for VM configuration in the cluster

-Calculated on each host: Availableslots, Usedslots, totalslots

-Calculate Totalslotsremaining=sum (H min totalslots in host group) Note that all nodes are sorted by totalslots, and H is taken from small to large totalslots
-If Sum (usedslots) <= totalslotsremaining, the cluster is not loaded and the status is set to OK.

4.2 Slot Full

Each combination of R and H needs to be calculated

-slotsize = max memory overhead vm in R

-Calculate availableslots, Usedslots, totalslots on each host
-Total number of totalslots totalslotsremaining =h hosts
-If SUM (usedslots) > totalslotsremaining, the cluster state may be overcommitted.
-If the sum (usedslots) of each combinationis <= totalslotsremaining, the cluster status is OK

4.3 Proof simple

-LARGESTCLUSTERVM = maximum memory overhead VM in cluster

-Calculate the additionalmemory of all hosts , the number of VMS
-Totaladditionalspace = SUM (the h minimum Additionalmemory in the host group), as in the totalslotsremaining value in the slot simple algorithm

-Totalorphanedvms = SUM (max vm*r) –largestclustervm.
-If Totalorphanedvms <= totaladditionalspace, the cluster status is OK.

Special Note: If Totalorphanedvms =0, LARGESTCLUSTERVM > 0 and totaladditionalspace = 0, the cluster may be overcommitted

4.4 Proof Full

Each combination of R and H needs to be calculated

-LARGESTCLUSTERVM = max memory overhead vm in R
-Compute additionalmemory on all hosts, VM count

-Totaladditionalspace = Sum (H host additionalmemory)

-Totalorphanedvms =sum (VM memory for R host) –largestclustervm.
-If Totalorphanedvms > Totaladditionalspace, the cluster may overcommitted.
-If Totalorphanedvms = 0, largestclustervm > 0 and totaladditionalspace = 0, may overcommitted.

If each combination of Totalorphanedvms < Totaladditionalspace, the cluster status is OK.

The algorithm is finished, and finally say: none of the above algorithms will mark the cluster status as overloaded

............

is not feel wasted time to read this article, then I say what in my sleep, obviously is how to detect overload!

In fact, in the beginning, it has been said that VMM for the cluster state is assumed to be overcommitted, in other words, the default value is overloaded, the state algorithm is only responsible for flag status OK. So if the above algorithm does not prove that the cluster has not been loaded, SCVMM will show the cluster status as overcommitted, whereas if any of the algorithms prove that the cluster is not overloaded, the status will be set to OK

In addition, there is a processing logic: it is in the full complexity check, as long as there is a set of calculations can be overloaded or overloaded (this also shows that I was in the algorithm explained above, why there are several places to write is "possible" overcommit), Then stop this test immediately, the cluster status is overcommited

Part 5. Demo

Well, finally to the routine of the people look forward to the demo, if only say the algorithm does not carry out a demonstration, this concept is indeed more abstract,

Scenario: 4-node cluster with host 4x 32GB hosts. The system retains 9GB of memory. 64MB of Hyper-V monitoring requires memory to not participate in the calculation (completely the diagram is my convenience), Cluster Reserve (R) is 2, the cluster is composed of, for example, r=2,h=2,n=4:

Slot Simple Example

-Slot size = 8GB

-totalslotsremaining = SUM (2 minimum slots) = (1+3) = 4
-Totalusedslots = 7

Judging: totalusedslots > Totalslotsremaining, this method test failed

Slot Full Example

-Totalusedslots = 7

Each row in the previous table represents a combined middle value and result of the operation

Judging: You can see some totalusedslots > totalslotsremaining, so this method test failed (here to say, in fact, not all calculated, as long as there is a set of data detection did not pass this operation is already over)

Proof Simple Example

-LARGESTCLUSTERVM = 8GB

-Totaladditionalspace = SUM (h min additionalmemory) = 0GB + 5GB = 5GB.
-Totalorphanedvms = (8GB + 8GB) –8GB = 8GB.

Judging: Totalorpanedvms > Totaladditionalspace, this method detects failure

Proof Full Example

Each row in the table represents the middle value and result of the operation of a set of RH combinations

Judging: You can see each group (ORPHANED–LARGESTVM <= additionalmemory), the condition is satisfied, so the cluster status is set to: "OK"

Virtual machine Manager (SCVMM) cluster overload state detection algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.