How to handle page corruption or checksum errors in the database

Source: Internet
Author: User
How to handle page corruption or checksum errors in the database

Author: nzperfect/perfectaction
Date: 2009.09.27

I have been learning more about database Fault Handling recently. As a database maintenance personnel, I expect all database error cases to increase my experience, however, we are also worried about data loss caused by database faults that cannot be handled in one way or another.

A few days agoArticleWhen recruiting a DBA, a website administrator asks the following question: "What should I do if I find a page corruption or a checksum error in the SQL server log ?" As described by the website administrator, about 90% of applicants will adopt a solution. Use DBCC checkdb to add a repair option, however, no one can explain the repair process, working principle, and extent of DBCC checkdb.

With the help of online documents and some personal understandings and experiences, I will explain how to deal with this problem: "What should I do when the database page is corrupted or the checksum fails? "

First, you must first understand DBCC checkdb. online document URL:
Http://technet.microsoft.com/zh-cn/library/ms176064.aspx

Through the online documentation, we can know that there are three repair options: repair_allow_data_loss | repair_fast | repair_rebuild. The actual function is provided only by the repair and repair_rebuild options, in which the repair attempts to fix all errors reported, these fixes may cause some data loss, and the repair_rebuild execution will not repair data loss, including quick fixes (such as fixing missing rows in non-clustered indexes) and more time-consuming repairs (such as re-indexing). We can see that repair_rebuild is what we expect.
When you log from SQL Server orProgram When querying the database or regularly performing health check for the database through DBCC checkdb, page corruption or checksum error information appears, such: -- Bytes -------------------------------------------------------------------------------------------------------------------------------
M8928sg, Level   16 , State 1 , Line 1
Object ID 2088535921 , Index ID 0 , Partition ID 72345201021503994 , Alloc unit ID 72345201051571606 (Type In- Row data): Page ( 1 : 94299 ) Cocould Not Be processed. See other errors For Details.
MSG 8939 , Level   16 , State 98 , Line 1
Table Error: Object ID 2088535921 , Index ID 0 , Partition ID 72345201021503994 , Alloc unit ID 72345201051571606 (Type In- Row data), page ( 1 : 94299 ). Test (is_off (buf_ioerr, pbuf -> Bstat) failed.
Checkdb found 0 Allocation errors And   2 Consistency errors In   Table   ' Yourtable ' (Object ID 2088535921 ).
Checkdb found 0 Allocation errors And   2 Consistency errors In   Database   ' Yourdb ' .
Repair_allow_data_loss Is The minimum repair Level   For The errors found By   DBCC Checkdb (yourdb ).
-- Bytes -------------------------------------------------------------------------------------------------------------------------------

What should we do now?

1. The above prompt tells us that the object 2088535921 has an error. It is a table and the page is 1: 94299.
2. Next, let's determine whether the damaged page is clustered or non-clustered on the stack. the SQL Server method is: DBCC Traceon ( 3604 , - 1 )
Go
DBCC Page ( ' Yourdb ' , 1 , 94299 , 3 )
Go

In the output result (an error is reported, but the page header information is displayed), you can see Metadata: indexid=N

If n is 0, it indicates a heap, 1 indicates a clustered index, and> 1 indicates a non-clustered index.
PS: in fact, from the Object ID 2088535921 and index ID 0 of the prompt information, you can also easily determine that it is a heap.
3. according to the above step 2nd, we know that this page is a heap. This is not good news for us, because if it is> 1, we can delete the non-clustered index and re-create the index, no data will be lost, and 0 or 1 is damaged by metadata, which means there is a possibility of metadata loss.
So how can we fix this data page? Here we assume that the database is in full mode and there are good backup policies, including full backup and log backup.
You can perform page-level restoration as follows:

A. Perform a log backup first. If you are not at ease, you can perform a full backup;
Backup   Log Yourdb To   Disk = ' D: \ dbbak \ yourdb_a.trn '
B. Restore the page through full backup. (yourdb. Bak is a full backup .);
Restore   Database Yourdb page =   ' 1: 94299 '   From   Disk = ' D: \ dbbak \ yourdb. Bak '   With Norecovery
C. Restore the difference after the full backup (assuming there is a difference yourdb. DIF). If there is no difference between the full backup, go directly to step d;
Restore   Database Yourdb From   Disk = ' D: \ dbbak \ yourdb. dif ' With Norecovery
D. There may be multiple log backups after recovery (assume yourdb_1.trn, yourdb_2.trn );
Restore   Log Yourdb From   Disk = ' D: \ dbbak \ yourdb_1.trn '   With Norecovery
Restore   Log Yourdb From   Disk = ' D: \ dbbak \ yourdb_2.trn '   With Norecovery
Restore   Log Yourdb From   Disk = ' D: \ dbbak \ yourdb_a.trn '   With Norecovery
E. Prepare the latest log;
Backup   Log Yourdb To   Disk = ' D: \ dbbak \ yourdb_e.trn '
F. Restore the last (Step E) log backup;
Restore   Log Yourdb From   Disk = ' D: \ dbbak \ yourdb_e.trn '   With Recovery
G. End
4 . After step 3, let's check whether there are any errors in the Table. From the prompt message object ID 2088535921, we can find the table name tbname;
Tbname: Select   Object_name ( 2088535921 )
Then DBCC Checktable ( ' Yourtable ' ) Detection. If no error is reported, the repair is completed.
5 . Finally, perform a DBCC checkdb check for the entire database;

PS: the page-level recovery of SQL Server is in the Enterprise and Development editions. It supports online recovery of page data and can only be repaired offline in the Standard Edition;
In the DBCC checkdb Repair Option, use repair_rebuild to repair data. The online documentation says no data is lost, but data may also be lost in some environments. However, I have never met either of them :)
When the repair_allow_data_loss option is used, online documents may lose data. If the heap or clustered index page is damaged, SQL Server releases the page, causing data loss, however, the repair_allow_data_loss option does not cause data loss in two cases: Non-clustered index page errors and LOB page data errors.

General introduction:
There must be good database backup policies, which are more important than everything;
The backup file must be synchronized from time to time;
When the database fails, Do not worry too much. Analyze the error calmly;
If you are not sure how to do this, you can use Google. If your error message contains Chinese characters, translate it into English and then Google. This makes it more likely to find a solution;
The database must be retained during restoration;
The repair_allow_data_loss option of DBCC checkdb is always the final choice.

End. If any error occurs, please correct it.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.