How to handle page corruption or checksum errors in the database
Author: nzperfect/perfectaction
Date: 2009.09.27
I have been learning more about database Fault Handling recently. As a database maintenance personnel, I expect all database error cases to increase my experience, however, we are also worried about data loss caused by database faults that cannot be handled in one way or another.
A few days agoArticleWhen recruiting a DBA, a website administrator asks the following question: "What should I do if I find a page corruption or a checksum error in the SQL server log ?" As described by the website administrator, about 90% of applicants will adopt a solution. Use DBCC checkdb to add a repair option, however, no one can explain the repair process, working principle, and extent of DBCC checkdb.
With the help of online documents and some personal understandings and experiences, I will explain how to deal with this problem: "What should I do when the database page is corrupted or the checksum fails? "
First, you must first understand DBCC checkdb. online document URL:
Http://technet.microsoft.com/zh-cn/library/ms176064.aspx
Through the online documentation, we can know that there are three repair options: repair_allow_data_loss | repair_fast | repair_rebuild. The actual function is provided only by the repair and repair_rebuild options, in which the repair attempts to fix all errors reported, these fixes may cause some data loss, and the repair_rebuild execution will not repair data loss, including quick fixes (such as fixing missing rows in non-clustered indexes) and more time-consuming repairs (such as re-indexing). We can see that repair_rebuild is what we expect.
When you log from SQL Server orProgram When querying the database or regularly performing health check for the database through DBCC checkdb, page corruption or checksum error information appears, such: -- Bytes -------------------------------------------------------------------------------------------------------------------------------
M8928sg, Level 16 , State 1 , Line 1
Object ID 2088535921 , Index ID 0 , Partition ID 72345201021503994 , Alloc unit ID 72345201051571606 (Type In- Row data): Page ( 1 : 94299 ) Cocould Not Be processed. See other errors For Details.
MSG 8939 , Level 16 , State 98 , Line 1
Table Error: Object ID 2088535921 , Index ID 0 , Partition ID 72345201021503994 , Alloc unit ID 72345201051571606 (Type In- Row data), page ( 1 : 94299 ). Test (is_off (buf_ioerr, pbuf -> Bstat) failed.
Checkdb found 0 Allocation errors And 2 Consistency errors In Table ' Yourtable ' (Object ID 2088535921 ).
Checkdb found 0 Allocation errors And 2 Consistency errors In Database ' Yourdb ' .
Repair_allow_data_loss Is The minimum repair Level For The errors found By DBCC Checkdb (yourdb ).
-- Bytes -------------------------------------------------------------------------------------------------------------------------------
What should we do now?
1. The above prompt tells us that the object 2088535921 has an error. It is a table and the page is 1: 94299.
2. Next, let's determine whether the damaged page is clustered or non-clustered on the stack. the SQL Server method is: DBCC Traceon ( 3604 , - 1 )
Go
DBCC Page ( ' Yourdb ' , 1 , 94299 , 3 )
Go
In the output result (an error is reported, but the page header information is displayed), you can see Metadata: indexid=N
If n is 0, it indicates a heap, 1 indicates a clustered index, and> 1 indicates a non-clustered index.
PS: in fact, from the Object ID 2088535921 and index ID 0 of the prompt information, you can also easily determine that it is a heap.
3. according to the above step 2nd, we know that this page is a heap. This is not good news for us, because if it is> 1, we can delete the non-clustered index and re-create the index, no data will be lost, and 0 or 1 is damaged by metadata, which means there is a possibility of metadata loss.
So how can we fix this data page? Here we assume that the database is in full mode and there are good backup policies, including full backup and log backup.
You can perform page-level restoration as follows:
A. Perform a log backup first. If you are not at ease, you can perform a full backup;
Backup Log Yourdb To Disk = ' D: \ dbbak \ yourdb_a.trn '
B. Restore the page through full backup. (yourdb. Bak is a full backup .);
Restore Database Yourdb page = ' 1: 94299 ' From Disk = ' D: \ dbbak \ yourdb. Bak ' With Norecovery
C. Restore the difference after the full backup (assuming there is a difference yourdb. DIF). If there is no difference between the full backup, go directly to step d;
Restore Database Yourdb From Disk = ' D: \ dbbak \ yourdb. dif ' With Norecovery
D. There may be multiple log backups after recovery (assume yourdb_1.trn, yourdb_2.trn );
Restore Log Yourdb From Disk = ' D: \ dbbak \ yourdb_1.trn ' With Norecovery
Restore Log Yourdb From Disk = ' D: \ dbbak \ yourdb_2.trn ' With Norecovery
Restore Log Yourdb From Disk = ' D: \ dbbak \ yourdb_a.trn ' With Norecovery
E. Prepare the latest log;
Backup Log Yourdb To Disk = ' D: \ dbbak \ yourdb_e.trn '
F. Restore the last (Step E) log backup;
Restore Log Yourdb From Disk = ' D: \ dbbak \ yourdb_e.trn ' With Recovery
G. End
4 . After step 3, let's check whether there are any errors in the Table. From the prompt message object ID 2088535921, we can find the table name tbname;
Tbname: Select Object_name ( 2088535921 )
Then DBCC Checktable ( ' Yourtable ' ) Detection. If no error is reported, the repair is completed.
5 . Finally, perform a DBCC checkdb check for the entire database;
PS: the page-level recovery of SQL Server is in the Enterprise and Development editions. It supports online recovery of page data and can only be repaired offline in the Standard Edition;
In the DBCC checkdb Repair Option, use repair_rebuild to repair data. The online documentation says no data is lost, but data may also be lost in some environments. However, I have never met either of them :)
When the repair_allow_data_loss option is used, online documents may lose data. If the heap or clustered index page is damaged, SQL Server releases the page, causing data loss, however, the repair_allow_data_loss option does not cause data loss in two cases: Non-clustered index page errors and LOB page data errors.
General introduction:
There must be good database backup policies, which are more important than everything;
The backup file must be synchronized from time to time;
When the database fails, Do not worry too much. Analyze the error calmly;
If you are not sure how to do this, you can use Google. If your error message contains Chinese characters, translate it into English and then Google. This makes it more likely to find a solution;
The database must be retained during restoration;
The repair_allow_data_loss option of DBCC checkdb is always the final choice.
End. If any error occurs, please correct it.