[Virtualization practice] Cluster Design 3 HA

Source: Internet
Author: User

Author: Fan Jun Frank Fan) Sina Weibo: @ frankfan7

I. Why?HA

High Availability is one of the most prominent features of the virtualization platform. The configuration and maintenance are very simple and the technology is very mature. For some very critical applications, the Disaster Tolerance requirements may be extremely high. You can consider using application-layer-based HA, or operating system-layer HA, such as MSC. HA on the Virtual Layer is highly available in the underlying architecture and is a good choice when the recovery time is acceptable.

Compared with the application layer and operating system layer HA, vSphere HA achieves high availability of the entire cluster at a low cost, while implementing and maintaining it is very simple. You do not need to make any settings or changes on the application or VM.

IIHow does HA work?

HA Agent

VSphere5.0 and later versions have made many changes to the architecture. Instead of the Primary Node and Secondary Nodes concepts in the original Cluster. The Master HAagent and Slave HA Agent concepts are introduced. Generally, there is only one Master HA Agent in the Cluster. The HA Agent provides the following functions:

-Exchange information with vCenter

-The Master HA Agent monitors the status of the VM and restarts when a problem occurs.

-The Slave HA Agent transmits the VM status information to the Master HA Agent and restarts the VM under the MasterAgent command.

-Check the status of applications running on the VM.

When a problem occurs on the host where the Master HA is located, the agents on other hosts start to run for MasterHA, and the host with the largest number of DataStore connections will become the Master. If the two hosts have the same DataStore quantity, the host with a high Managed Object Id will become the Master.


HeartBeating

Used to determine whether the host is still running normally.

Network Heartbeating

Each Slave sends Heartbeat information to the Master.

Datastore Heartbeating

In some cases, if the Management network is interrupted and the VM can continue to access other networks and storage, no response is required to the VM on the isolated host. This requires detection of Datastore Heartbeating for further verification.

For Converged Infrastructure systems, such as Cisco UCS, Datastore Heartbeating does not play a major role, because the management network and storage share the physical link, when the management network is interrupted, storage may be inaccessible.

Host isolation

Detection:When a host cannot communicate with the Management Network, that is, the host is considered isolated after the isolation Address ping fails. The Management Network Gateway is isolationAddress by default. To increase reliability and avoid misjudgment, you can set multiple isolationaddresses

Response:

The following analysis helps you select the appropriate response action after you confirm that the host is isolated.


Accessible to hostsPossibility of VM DataStore

AccessiblePossibility of VM Network

Recommended Response Actions

Reason

Possible

Possible

Leave Powered On

The VM may still run normally.

Possible

Impossible

Leave Powered On or Shut Down

Because the VM can still access DataStore, you can select shutdown to restart the VM on another host.

Impossible

Possible

Power Off

This prevents the two VMS from copying and running at the same time. Especially when iSCSI or NFS is used.



The above is only a reference. Leave Powered On applies to most cases. Because most of the virtualization design considers network redundancy, HostIsolation is rare.

When iSCSI or NFS is used, if you estimate that the management network is interrupted, the storage network may also be interrupted. Consider using PowerOff. when the host cannot access the storage, HAAgent starts the second Instance of the VM on another host. at this time, the first Instance is still running on the isolated host. When all networks are restored normally, this may cause a lot of trouble. Because the same VM has two instances running at the same time.


Admission Control

It is used to ensure that the Cluster has sufficient resources for VMS on the problematic host when there is a problem with the host.

The following policies are available:

1 Define failover capacity bystatic number of hosts

2 Use dedicated failover hosts

3 Define failover capacity byreserving a percentage of the cluster

Resources

The third method is to define the Failover Capacity in percentage mode, which is applicable to most cases. The maximum possible efficient use of resources allows you to run a large number of VMS.



3.Basic Design Principles

  • In the HA policy, vCenter and important VMS are started first. Such as DNS, AD, ms SQL. At the same time, note that HA does not completely guarantee the order of VM restart. VMware SRM can be considered if there is a complex dependency between VMS and strict startup is successful.

  • If maintenance may interrupt the Management Network, consider temporarily blocking HA to avoid triggering HA isolation response measures.

  • It is recommended that each host has the same CPU and RAM configurations. If a host in a Cluster has a high configuration, the HA policy must ensure that there are sufficient resources to ensure that the VM running the host

  • Although multiple clusters can share the same DataStore, it is best to allocate an exclusive Datastore for each Cluster. In this way, management can be simplified, and HA can easily restart the VM after a host is isolated.

  • Pay attention to the redundancy of the Management Network. Because the HA Network Heartbeating depends on the ManagementNetwork

  • In the case of Stretched Cluster, the host and storage in the Cluster are distributed in two data centers with a long interval. We recommend that you set up at least four HeartbeatDatastore. Each data center has two.


Refer:

VMware vSphere 5.1 ClusteringDeepdive by Duncan Epping

HAArchitecture demo-by Josh Odgers

VSphereAvailability Guide

VMware vSphere High Availability5.0 Deployment Best Practices

This article from "sit up and watch the cloud" blog, please be sure to keep this source http://frankfan.blog.51cto.com/6402282/1329945

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.