SQL Server database Suspect solution case

Source: Internet
Author: User
Tags failover

Production Environment:

SQL Server 2008 R2 Active/Passive Nodes, Windows Server 2008 R2 SP1 Cluster, vSphere 5.x

Start

6 am received the Application Team report BiztalkMsgBoxDb enters the suspect mode and cannot be accessed.

Report events to reduce user pressure

I made a simple phone call with App Manager to learn about the down time at the Apps layer, enter the approximate occurrence time and description in Ticket, and check whether any change has occurred recently. If you do not have a Ticket system, send a group email to relevant personnel. Call the Incident Manager to manage all event updates. The advantage of doing so is that people who are panic know what is going on and reduce their stress.

Sort yourself

Am many people make their calls nervous. In simple brainstorm, what is the possible cause of suspect: file groups (data and logs) corruption? Error in Disk Full/SAN Disk? Is the backup still there?

Check the Error Log to locate the start Error.

At am, the initial Error was found. One minute after the successful Log backup, the Error message displayed: OS Error Caused LogWriter's log flush (write Log) failure. Data suspect may occur if logs cannot be written.

2014-03-17 03:15:56. 05 spid5s Error: 17053, Severity: 16, State: 1.

2014-03-17 03:15:56. 05 spid5s LogWriter: Operating system error1117 (failed to retrieve text for this error. Reason: 15105) encountered.

2014-03-17 03:15:56. 05 spid5s Write error during log flush.

03:15:56. 05 spid79 Error: 9001, Severity: 21, State: 4.

03:15:56. 05 spid79 The log for database 'biztalkmsgboxdb' isnot available. Check the event log for related error messages. Resolve anyerrors and restart the database.

03:15:56. 05 spid85 Error: 9001, Severity: 21, State: 4.

Analysis Error:

1117 OS error, related disk. The log file is still in use and the disk is not full. You can consider migrating log files.

First DBCC Repair Attempt

(The basis for any attempt is to understand that your actions will not make the situation worse)

Command alter database [xxxxxx] set emergency;

Command error, database locked, cannot alter database, directly discard dbcc checkdb (N 'xxxxxxx', REPAIR_ALLOW_DATA_LOSS) WITH NO_INFOMSGS, ALL_ERRORMSGS; fix.

Why discard: DBCC Repair requires the database to be in emergency mode, and it will try to use the existing log to restore the database to consistency (consistent recover ). If there is a problem with the log, it will recreate the log (I think this is what repair allow data loss means ). for a database larger than 100 GB, rebuild log may take several hours. Considering the recovery time object (RTO) and SLA (service level agreement ), database downtime is not allowed for a long time (reflection afterwards ). Fortunately, you cannot alter the database. The error message directly indicates the database log locked, which implies that the database log may not have upt, so there is no need to worry about dbcc repair.

Afterwards, I thought that log file was upted incorrectly and dbcc repair was the first step in methodology, it should be because you have not confirmed to the user whether the active transaction for the past 15 minutes can be lost (although the customer is still sleeping) (every 15 minutes of transaction log backup ), what's more, it will keep the database down for a longer time. It may not be recovered before, and it may not be faster than database backup restore. As the first step of methodology, you should first check whether file was upted and contact the server team for IO exceptions.

The second attempt to migrate the log file

When a resource lock problem occurs, the first response is to kill or restart the resource. Due to insufficient skills or incorrect methodology, the lock resource cannot be found at the first time, so I chose to restart the resource.

It should be a Windows Cluster, so you do not need to detach/attach the database and directly failover to the passive server. The database is equivalent to restarting and instance recovery after failover. Now the log file can be written, and the database is restored to Active.

After resolving the problem temporarily, switch the database over to the original active server. No error indicates that the disk is not a problem. It may be a disk interface problem. In addition to log backup, event viewer has no other information. Sp_who2 does not find any suspicious database lock, and the database process locks the database or logfile.

Establish Problems

At 7 am, let the Server Team check the disk and suspect that eva san has a problem. Now we only know the starting error and solution. As a Problem, it is left to the Problem Manager for further improvement to avoid the same Problem in the future.

Conclusion: In case of database suspension caused by log file, the solution methodology is to first (1) confirm the disk problem, then (2) confirm the database process lock, and then (3) confirm whether the upt is successful, after completing these check ups, we will propose a solution for (1) (2) (3. The higher the severity from (1) to (3), the higher the possibility of data loss after recovery. Confirm the risks of online repair with the customer. The final straw is naturally a complete data backup solution and a regular database recovery execution plan.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.