Background
Last night a friend encountered the cluster of AG to switch back and forth unstable situation, the feeling of desperation, friends at the command line restart the WSFC cluster with commands
Results after restarting the WSFC cluster, the entire AG cannot be started, and the primary and secondary replicas are in the state of being resolved.
So the friend called me for help and asked about the situation and the environment.
Environment
System: WINDOWS2012R2
Database: SQL Server2014 SP2
Three machines, one domain control, two database nodes
Process
So I looked at the WSFC logs and the SQL Server logs and did not find useful information, see the downtime more and more long, had to restore business, but there is AG in the state of being resolved
Unable to do anything, including: Backing up the database, detaching the database, deleting the AG, etc.
Continue to ask friends database backup situation, the database is a complete daily, every hour a day, when the situation is the last day is 40 minutes away
If you restore the database to restore the business, it will result in 40 minutes of data loss
At that time, witted may directly copy the MDF file and the LDF file and attach the ability to recover the database, so the two database node SQL Server services are stopped, and then directly all the database MDF files and
LDF file is copied and relocated to another SQL Server server, the SQL Server server is a stand-alone database and does not make any highly available cluster
After all the database has been relocated, the database to carry out additional operations, unexpectedly is incredibly able to attach success!
After all the databases have been attached, create a login account, modify the program connection, verify the connection, verify the data, reopen the business, business recovery, the whole process took about 2 hours
Postscript
A day later, the AG cluster was repaired, how do you re-add the current business library from the machine on the standalone SQL Server to the AG cluster?
The average person will move the business library from the standalone SQL Server back to the node of the AG and redo the AG in a variety of ways
Today walk June did an experiment, the experiment environment is exactly the same as a friend's environment, found that only need to separate all the business libraries on the single-machine SQL Server,
Then stop the SQL Server service for all nodes in the AG, and then copy the MDF file and the LDF file back to all the AG nodes overwriting the original database file (note that backup is done)
Then start the SQL Server service for each node in the AG, the AG has no error, everything is back to normal, of course, this method of downtime will be longer than the usual method
Note the point:
1. When copying a database file to a standalone SQL Server, select the secondary replica in the primary copy or synchronous mode
2, copy the database file from the stand-alone SQL Server to the AG node, copy to all the nodes of the AG
Summarize
SQL Server should not validate the database, that is, whether the database has been clustered without validation, so this approach is successful
Starting with the start of SQL Server2012 AlwaysOn, AlwaysOn This DB cluster technology relies on the OS wsfc for failover, all the way to SQL Server2017.
For the WSFC problem, even the experienced SQL Server DBA may not be able to fix it, because of the deep-rooted principles of windows, some issues also send a dump file to Microsoft Analytics to solve,
Always think that Microsoft's technology is too closed, anyway, there is a temporary solution is better than no
If there is a wrong place, welcome everyone to shoot brick O (∩_∩) o
The copyright of this article is owned by the author and cannot be reproduced without the author's consent.
SQL Server AG cluster not starting up temporary self-help big strokes