ESX storage failover mechanism for NFS FAQ

Source: Internet
Author: User
Tags failover backup

NFS storage for ESX typically configures backup links. Automatically switch to the backup link when the primary link fails, which is called failover.

Q: When will the failover be launched? A: When a storage link fails to be found.

Q: How can I tell if a storage link is interrupted? A: I can't find my heartbeat.

Q: What happened to the heartbeat? How can I find a heartbeat?

A: Normally the ESX host initiates a heartbeat to the store at regular intervals (heartbeatfrequency), and each heartbeat test must receive a response within a certain amount of time (heartbeattimeout), otherwise it is a heartbeat test failure ( Heartbeatfailure), successive failures to a certain number of times (heartbeatmaxfailure) even if the link fails.

These parameters should be modified to the following recommended values (whether NetApp or EMC NAS devices)

Nfs. Heartbeatdelta (NFS. Heartbeatfrequency in ESX 3.x) 12

Nfs. Heartbeattimeout 5

Nfs. Heartbeatmaxfailures 10

The meaning of these recommended parameter values is: NFS.HEARTBEATFREQUENCY=12 indicates that a heartbeat test is initiated every 12 seconds. 5 Seconds No response even if timeout, has been accumulated to 10 times without a response to the NFS storage is lost, only to initiate failover action. The middle of this actually passes through 12s*10+5s=125 seconds time. That is to say, really want to initiate a failover event, ESX host to wait 125 seconds.

Q: So, what's happening from the VM's point of view in the 125 seconds?

A: The VM will find that the disk connected on its VSCSI controller stops responding, depending on how long the guest OS will tolerate a disk failure (delayed write error), when this IO error occurs on the system disk of the guest OS, will cause the OS to crash. The Windows operating system default disk timeout is 60 seconds. In other words, the Guest OS crashes when the ESX host is still in the 125-second wait time without performing the failover action. With the guest level ha enabled, the Guest OS reboots when NFS storage resumes. But is it better to reconfigure the guest OS parameters so that it can also wait 125 seconds? How do I do that? With regedit, modify the TimeOutValue value under Hklm\system\currentcontrolset\services\disk to 125. (There is a risk of modifying the registry, please make sure to back it up before modifying it)

This article is from the "Delxu Live notepad" blog, please be sure to keep this source http://delxu.blog.51cto.com/975660/277510

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.