VMware vsphere 5.1 cluster Walkthrough (iii) Basic concepts

Last Update:2017-02-27 Source: Internet

Author: User

Tags resource

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Now that you have a look at the related components of HA, let's now discuss some of the basic concepts of HA clusters:

Primary/Standby Agent

Heartbeat

Quarantine vs Network Partition

Virtual Machine State Protection

People who have built vsphere know that a cluster can include multiple hosts, that the cluster collects resource information well, that resources can be divided into different resource pools by vsphere DRS (Resource dynamic Distribution), or to increase the reliability of ha.

In Vsphere 5.0, many places involving ha have been changed, for example, using an HA cluster consisting of two types of nodes, nodes can be a master node, a secondary node, and allow the cluster to expand to 32 hosts, a concept that relies on AAM,FDM has completely changed the rules of the game, and removes the primary and secondary nodes of the entire concept. (The more detailed (AAM) node mechanism, we ask you to view the vsphere 4.1 HA and DRS technical deepdive, http://virtualbox.blog.51cto.com/531002/ 1127451 the HA mechanism of Vsphere 4.1 is also described here.

Another very important design is about the primary HA node, that is, each cluster has a maximum of 5 primary nodes, which are the core of the HA implementation, at least one master node can continue to work if a failure occurs, or the virtual machine will not reboot, from the architecture, at least one driver needs to be rewritten to ha.

In the Vsphere 5.0 architecture, the primary/standby HA agent is introduced, except for the network subregion, which we discuss, where there is only one primary ha agent, and the other agents can be used as the primary HA agent, and the main agent is responsible for monitoring the health status of the virtual machine, If the virtual machine fails, reboot it, the prepared HA agent is responsible for forwarding the information of the primary HA agent and restarting the virtual machine specified by the main ha agent. There is another change on the HA agent, whether as a primary ha agent or a standby agent, has a virtual machine/app monitoring function, similar to the function of AAM, is a part of the VPXA function.

Main Agent

As mentioned earlier, the main agent is primarily responsible for tracking the status of virtual machines and taking action when appropriate. Under normal circumstances, there is only one main agent in a cluster, and then we will discuss a scenario where there are multiple main agents in a cluster, but now let's talk about a cluster of primary agents, which is responsible for declaring the "ownership" of the virtual machine's configuration data file.

Basic design Principles

To maximize the chance of a virtual machine booting up in the case of a failure, we recommend that the unrelated data store on the cluster be screened, although shared storage can provide services in different clusters, but from an administrative perspective it increases the complexity of the architecture.

This is not the whole responsibility, ha Master is also responsible for exchanging state information with vcenter, which means that it can not only receive, but also feed back information to vcenter, when the host fails, ha Master will start the virtual machine on the host, you may immediately want to ask, What happens when Master fails, or, in layman's words, which host will become master, and when?

Election

Whenever the agent cannot connect to master in the network, a group of HA agents is elected master, which occurs when the cluster first enables HA or when there are already running hosts in the cluster:

Fault

Network partitioning or isolation

Connection to vcenter server is disconnected

When HA is reconfigured

Ha's election takes about 15 seconds, using the UDP protocol, when at the time of the election, HA does not respond to the failure, once the master election will be tested to deal with the election before and during the period of failure, the election process is simple but strong,

Hosts that can connect to shared data stores may be selected as master, and if two or more hosts are connected to the same number of shared data stores, the highest managed Object ID will be selected, and for a host, the HA status of the host will be displayed on the Summary tab. This includes the role shown in Figure 7, where the host is the master host.

When Master is elected, each slave establishes a secure, encrypted, TCP protocol connection to master in the Management network, which is based on SSL, where the emphasis is placed on the fact that when the master chooses, the slaves does not communicate with each other unless it is necessary to select Master again.

Figure 7:master Agent

As mentioned earlier, when Master is elected, it attempts to obtain all data stores, which can be directly accessed or slave through a proxy connection, by locking a file on the data store of an existing cluster called "Protectedlist", and master will try to use his own permissions, Discover any data store on the network, and it will retry periodically to know that it does not have that permission.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More