Windows R2+sql R2 cluster fault handling

Source: Internet
Author: User
Tags failover key string

" problem ":

The application system cannot connect to the database and cannot read the shared disk that accesses the repository

"Cause":

DB server ha cluster failure, cluster failed to start

"Contingency plan":

1. Try to restore the cluster;

2. Establish a stand-alone DB server and restore the backup data to the new DB server;

3. Recover DB Cluster shared disk up-to-date database file to handle issues that cannot be read due to cluster failure lock shared disk

"Collect log information and analyze problems":

1. Check the DB server Eventlog to see the error message:

650) this.width=650; "src=" Http://s2.51cto.com/wyfs02/M01/7C/62/wKiom1bPxSGyDFcUAACeQ7vHBiM946.png "title=" Eventlog.png "style=" width:887px;height:402px; "width=" 887 "height=" 402 "border=" 0 "hspace=" 0 "vspace=" 0 "alt=" Wkiom1bpxsgydfcuaaceq7vhbim946.png "/>

"failoverclustering Error": failed to uninstall failover cluster database. If restarting the Cluster service does not resolve the issue, restart the computer.

"Service Control Manager Error": The Cluster Service services terminated unexpectedly, which has occurred 2,451 times. The following remediation operation will run within 960000 milliseconds: Restart the service.

"Service Control Manager Error": Cluster Service services because the system cannot find the file specified. A service-specific error has been stopped.

2. Open the Cluster Manager, verify the configuration, and look for the problem:

First borrow a diagram, my cluster after the failure of the following interface, Cluster service start, forced restart also works ...

650) this.width=650; "src=" Http://s1.51cto.com/wyfs02/M01/7C/62/wKiom1bPxhGQm1IuAADGv8l21YM618.png "style=" float: none; "title=" Cluster1.png "alt=" Wkiom1bpxhgqm1iuaadgv8l21ym618.png "/>

650) this.width=650; "src=" Http://s1.51cto.com/wyfs02/M02/7C/61/wKioL1bPxoPgX3rlAACKtWZqEeQ288.png "style=" float: none; "title=" Cluster2.png "alt=" Wkiol1bpxopgx3rlaacktwzqeeq288.png "/>


Based on the validation results, an error occurred while verifying the storage and checking the mounting of the shared disk. (for back-up, and actual verification results discrepancy, for reference only)

650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>650 "this.width=650;" src= "http ://s2.51cto.com/wyfs02/m02/7c/62/wkiom1bpxn2d66yfaabyxvlge90054.png "title=" Cluster3.png "alt=" Wkiom1bpxn2d66yfaabyxvlge90054.png "/>

3. Check the shared disk:

Open Disk Manager and see Cluster disk status exception: the disk is offline due to an administrator's policy

650) this.width=650; "src=" Http://s1.51cto.com/wyfs02/M01/7C/63/wKiom1bPx5eS_70mAACIIuKdgWU941.png "style=" float: none; "title=" Disk1.png "alt=" Wkiom1bpx5es_70maaciiukdgwu941.png "/>

The two cluster node servers actually see that the disks are locked and inaccessible (borrowing a reference map from the Internet) as follows:

650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>650 "this.width=650;" src= "http ://s4.51cto.com/wyfs02/m00/7c/61/wkiol1bpyamhuwicaaddb11sjh8846.png "title=" Disk2.png "style=" Float:none; "alt=" Wkiol1bpyamhuwicaaddb11sjh8846.png "/>

According to the policy of the administrator, the disk is in the offline status of the key string on the Internet to check some information, are using the DiskPart command to remove the policy, after the attempt to find No.

By the way, familiarize yourself with the DiskPart command:

1. Run: cmd
2. Input: DISKPART.exe
3.diskpart> San
4.diskpart> San Policy=onlineall
5.diskpart>list Disk
6.diskpart> Select disk 1
7.diskpart>attributes disk Clear readonly
8.diskpart>online Disk


Here you can confirm if the disk is a cluster disk:

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M01/7C/61/wKioL1bPyP2zJGb3AAAiqz2dSkM647.png "style=" float: none; "title=" Disk3.png "alt=" Wkiol1bpyp2zjgb3aaaiqz2dskm647.png "/>

Find a Win2008 server, try to mount the cluster's shared disk, and see if you can read the data:

1). I am here storage device for NetApp NAS, add this server to the LUN's startup program group

2). Configure Iscis, connect storage devices

650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M00/7C/61/wKioL1bPyWnhBHZMAAB2MAuX3K4183.png "style=" float: none; "title=" Iscsi.png "alt=" Wkiol1bpywnhbhzmaab2maux3k4183.png "/>


3). After the connection is successful, open Disk Manager to see if the disk is mounted

But when I mounted it and found that I couldn't read the disk, clicking on the mounted disk will prompt you to format the hard drive!

4). Try to copy the volume using Snapmirror on the NAS, and then mount the copied disk, as the result is still inaccessible.

It seems that this method does not work and should be a cluster failure that causes the shared disk to be locked and inaccessible.


4. Try to repair or rebuild the cluster-related methods:

Check out some methods, according to the two blog practices, solve the problem!

https://blogs.technet.microsoft.com/askcore/2010/06/08/windows-server-2008-and-2008r2-failover-cluster-startup-switches/

http://jackprivate.blog.51cto.com/77144/1114650

Workaround:

1). Execute command:cluster Node/force

Reference information:

Cluster clear the node:

When a cluster fails, the Cluster service fails to start and the cluster can be rebuilt, but the cluster environment needs to be restored to its original state, otherwise it cannot be rebuilt successfully.

You can use the following command

Cluster Node/force

    • /force[cleanup] [/wait[:timeout_in_seconds]] manually restores the Cluster service configuration of the specified node to its initial state.


2). After execution, the Cluster service will be disabled:

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/7C/61/wKioL1bPyWnhJXoVAAAGg0L4r5s381.png "title=" Service.png "style=" Float:none; "alt=" Wkiol1bpywnhjxovaaagg0l4r5s381.png "/>

3). Restart the server, after booting to disk Manager or directly open the Explorer to see if the shared disk has appeared, quickly copy the data out!

4). The rest is the recovery attempt to restore the cluster's work, and there is no time to continue to try, I mainly want to copy the latest information from the shared disk, put on the new stand-alone server to use, and I have decided to abandon the cluster architecture to a single machine.


When I rebuild the cluster, I'll post the results.

This article is from the "IT Operations Experience consolidation" blog, please be sure to keep this source http://milton.blog.51cto.com/833904/1745223

Windows R2+sql R2 cluster fault handling

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.