Author: Fan Jun Frank Fan) Sina Weibo: @ frankfan7
I. Why?HA
High Availability is one of the most prominent features of the virtualization platform. The configuration and maintenance are very simple and the technology is very mature. For some very critical applications, the Disaster Tolerance requirements may be extremely high. You can consider using application-layer-based HA, or operating system-layer HA, such as MSC. HA on the Virtual Layer is highly available in the underlying architecture and is a good choice when the recovery time is acceptable.
Compared with the application layer and operating system layer HA, vSphere HA achieves high availability of the entire cluster at a low cost, while implementing and maintaining it is very simple. You do not need to make any settings or changes on the application or VM.
IIHow does HA work?
HA Agent
VSphere5.0 and later versions have made many changes to the architecture. Instead of the Primary Node and Secondary Nodes concepts in the original Cluster. The Master HAagent and Slave HA Agent concepts are introduced. Generally, there is only one Master HA Agent in the Cluster. The HA Agent provides the following functions:
-Exchange information with vCenter
-The Master HA Agent monitors the status of the VM and restarts when a problem occurs.
-The Slave HA Agent transmits the VM status information to the Master HA Agent and restarts the VM under the MasterAgent command.
-Check the status of applications running on the VM.
When a problem occurs on the host where the Master HA is located, the agents on other hosts start to run for MasterHA, and the host with the largest number of DataStore connections will become the Master. If the two hosts have the same DataStore quantity, the host with a high Managed Object Id will become the Master.
HeartBeating
Used to determine whether the host is still running normally.
Network Heartbeating
Each Slave sends Heartbeat information to the Master.
Datastore Heartbeating
In some cases, if the Management network is interrupted and the VM can continue to access other networks and storage, no response is required to the VM on the isolated host. This requires detection of Datastore Heartbeating for further verification.
For Converged Infrastructure systems, such as Cisco UCS, Datastore Heartbeating does not play a major role, because the management network and storage share the physical link, when the management network is interrupted, storage may be inaccessible.
Host isolation
Detection:When a host cannot communicate with the Management Network, that is, the host is considered isolated after the isolation Address ping fails. The Management Network Gateway is isolationAddress by default. To increase reliability and avoid misjudgment, you can set multiple isolationaddresses
Response:
The following analysis helps you select the appropriate response action after you confirm that the host is isolated.
Accessible to hostsPossibility of VM DataStore |
AccessiblePossibility of VM Network |
Recommended Response Actions |
Reason |
Possible |
Possible |
Leave Powered On |
The VM may still run normally. |
Possible |
Impossible |
Leave Powered On or Shut Down |
Because the VM can still access DataStore, you can select shutdown to restart the VM on another host. |
Impossible |
Possible |
Power Off |
This prevents the two VMS from copying and running at the same time. Especially when iSCSI or NFS is used.
|
The above is only a reference. Leave Powered On applies to most cases. Because most of the virtualization design considers network redundancy, HostIsolation is rare.
When iSCSI or NFS is used, if you estimate that the management network is interrupted, the storage network may also be interrupted. Consider using PowerOff. when the host cannot access the storage, HAAgent starts the second Instance of the VM on another host. at this time, the first Instance is still running on the isolated host. When all networks are restored normally, this may cause a lot of trouble. Because the same VM has two instances running at the same time.
Admission Control
It is used to ensure that the Cluster has sufficient resources for VMS on the problematic host when there is a problem with the host.
The following policies are available:
1 Define failover capacity bystatic number of hosts
2 Use dedicated failover hosts
3 Define failover capacity byreserving a percentage of the cluster
Resources
The third method is to define the Failover Capacity in percentage mode, which is applicable to most cases. The maximum possible efficient use of resources allows you to run a large number of VMS.
3.Basic Design Principles
In the HA policy, vCenter and important VMS are started first. Such as DNS, AD, ms SQL. At the same time, note that HA does not completely guarantee the order of VM restart. VMware SRM can be considered if there is a complex dependency between VMS and strict startup is successful.
If maintenance may interrupt the Management Network, consider temporarily blocking HA to avoid triggering HA isolation response measures.
It is recommended that each host has the same CPU and RAM configurations. If a host in a Cluster has a high configuration, the HA policy must ensure that there are sufficient resources to ensure that the VM running the host
Although multiple clusters can share the same DataStore, it is best to allocate an exclusive Datastore for each Cluster. In this way, management can be simplified, and HA can easily restart the VM after a host is isolated.
Pay attention to the redundancy of the Management Network. Because the HA Network Heartbeating depends on the ManagementNetwork
In the case of Stretched Cluster, the host and storage in the Cluster are distributed in two data centers with a long interval. We recommend that you set up at least four HeartbeatDatastore. Each data center has two.
Refer:
VMware vSphere 5.1 ClusteringDeepdive by Duncan Epping
HAArchitecture demo-by Josh Odgers
VSphereAvailability Guide
VMware vSphere High Availability5.0 Deployment Best Practices
This article from "sit up and watch the cloud" blog, please be sure to keep this source http://frankfan.blog.51cto.com/6402282/1329945