In the previous chapters, we described the concept of Ha, which is mostly basic. We have shown you the various mechanisms introduced by vSphere5.0 and the increased flexibility and reliability of Vsphereha, and the reliability of HA in this section focuses on the restart of virtual machines, which is still the primary task of HA.
When the state of the host changes, HA will respond, or better yet, when one or more virtual machine states have changed, in most cases ha will respond to the failure, most commonly as follows:
Host fails
Host Quarantine
Virtual Machine operation failure
Depending on the type of failure, and the dependencies on the host, the process will vary slightly, and the process will have different recovery times. Because there are a lot of different situations and not all of them, so we will try to describe the most common scenarios and the often possible points of time.
As we drill down into different failure scenarios, we compare the version before vsphere 5.0 with the reset priority and retry of Vsphere 5.0, which will be appropriate for each of the scenarios we describe.
Reboot priority and Order
Before the vsphere 5.0, when more than one virtual machine needs to be restarted, the HA's virtual machine boot priority is activated, and there is no change in itself, HA also configures the priority of the virtual machine, but in Vsphere 5.0, a new type of virtual machine is introduced: Proxy virtual machine, These virtual machines serve other virtual machines, so it is a priority to restart the virtual machine, and a good example is that the proxy virtual machines can serve the VShield endpoint virtual machines, which are considered to be the highest priority virtual machines.
The priority is in the host unit, not the global, when a reboot is received by each host, the highest priority virtual machine is started, and if the highest priority host fails, it delays retries, however, during which HA continues to open the remaining virtual machines, keeping in mind that some virtual machines may depend on the agent virtual machines , you should record which virtual machines depend on the agent virtual machine, and record the correct order in which the agent service is turned on when the proxy server fails to restart automatically.
Basic design Principles
Virtual machines can rely on the availability of proxy virtual machines or other virtual machines, although Ha will do its best to make all virtual machines start in the right order, but not absolutely guaranteed.
In addition to proxy virtual machines, HA also gives priority to the helper ft virtual machines, and we list the complete virtual machine restarts in the following order:
Agent Virtual Machine
Helper FT's virtual machine
Highest-priority virtual machines
Priority-Centered virtual machines
Least-prioritized virtual machines
It should be noted that HA does not place all virtual machines on a single host if a significant number of proxy virtual machines are required.
Now that we have a brief introduction to it, we also have to fix "reboot retry" and "parallel reboot", which will more or less determine the time when the virtual machine fails or the host is quarantined.
Restart attempt
In the Vcenter 2.5 U4 version, the number of virtual machine restart retries can be modified under the "das.maxvmrestartcount" option, the default is 5 times, the HA will always try to reboot in the version prior to Vcenter 2.5, which can cause problems, This can occur when multiple virtual machines are registered on multiple hosts simultaneously, causing confusion and inconsistency, see VMware KB (http://kb.vmware.com/kb/1009625) for details
Tips
In the version prior to Vsphere 5.0, the "das.maxvmrestartcount" option does not include the configuration of the number of restart retries, meaning that the total is restarted 6 times, as is the default value of Vsphere 5.0.
Ha will start the affected host on other hosts on the cluster, and if startup fails on the host, the restart count is increased by 1, before we begin to confirm the time, we can record the T0 as the first time the host tries to start the virtual machine, which is 30S, The overall duration of the virtual machine retry startup depends on the number of failures that we will discuss in this chapter.
As we said, the vsphere was started 5 times before the default, plus the first startup failed, totaling 6 times. Each attempt to reboot for a specific time, the next list will clarify the concept, the list of ' m ' represents minutes.
t0--First Boot
t2m--First reboot retry
t6m--Second reboot retry
t14m--Third reboot retry
t30m--reboot for the fourth time retry
Figure 17: High Available restart timeline
As is clearly depicted in Figure 17, if multiple attempts are unsuccessful, until a successful start-up may take about 30 minutes, this point, there is no exact scientific basis, for example, in the first reboot and the first reboot directly, there is a 2-minute wait time, and this time may be 2 minutes + 8 seconds, Another important fact that we have been emphasizing is that without master's coordination, multiple virtual machines will attempt to reboot and retain their own boot queues. In vsphere 5.0 U1, multiple master attempts to restart the virtual machine, although only one will succeed, it may still change the timeline.