Help! My database is broken. What should I do?

Source: Internet
Author: User
Document directory
  • Don't panic
  • Do not separate databases
  • Do not restart the Database Service
  • Do not rush to run REPAIR
  • Run integrity check
  • Root Cause Analysis

Database File Corruption may be the biggest headache for DBAs. In this article, I will explain some operations that should not be performed on databases when database files are damaged, then, we will explain some operations that should be performed based on the actual situation to help you solve this problem.

How do I determine file corruption?

It is easy to determine whether the file is damaged. When a query accesses an invalid page number, the query will end with a high severity error. Jobs that back up and rebuild indexes will fail. Some typical error messages are as follows:

SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0xfdff74c9; actual: 0xfdff74cb ). it occurred during a read of page (1: 69965) in database ID 13 at offset 0x0000002229a000 in file 'd: \ Develop \ Databases \ Broken1.mdf '.
Attempt to fetch logical page 1: 69965 in database 13 failed. It belongs to allocation unit 72057594049069056 not to 281474980642816.

 

The more terrible thing is that if there is no regular integrity check, the error may exist in hours, days, or even last month, and the subsequent problems will be difficult to solve.

This article will not discuss the status of the database question, because it can be discussed in a whole article or book to explain why the database question is questioned, find the cause of the database question, and solve the question.

What should we do when data files are corrupted?
  1. Don't be alarmed ..)
  2. Do not separate databases
  3. Do not restart the Database Server
  4. Do not rush to run REPAIR
  5. Run integrity check
  6. Finally, perform an in-depth analysis to analyze the root cause of the problem.
Don't panic

The consequence of panic is often irrational thinking or even not thinking.

Do not separate databases

When the server reports that the data file is corrupted, some damaged pages do exist in the SQL data file. Therefore, the separation of attaching, backup and restoration, and the restart of database services cannot be ruled out.

Do not restart the Database Service

Similarly, restarting the database service does not improve the situation.

If the SQL statement encounters Data File Corruption during restart-recovery implementation, then the database will be labeled as a questioning State, making it more difficult to fix the problem in the future.

Do not rush to run REPAIR

Some engineers may use CheckDB and the Allow Date Loss option and believe this will solve many problems. In fact, in many cases, running DBCC Allow Data Loss is not a recommended method, this cannot fix all errors and may cause data loss.

In many cases, this is a last resort and should be used only when other methods cannot be solved, rather than as the preferred method.

Run integrity check

To determine how to fix data errors or which part of the data has an error, you should run CheckDB and use the All_ErrorMsgs option. In addition, the No_Infomsgs option can disable the display of specific page errors, which does not help us in troubleshooting.

For large databases, CheckDB may take a long time, but we should wait for it to run completely to report all errors. Our repair policies should be formulated based on all errors.

Root Cause Analysis

 

 

When the repair is completed, we should continue to investigate the more fundamental causes. Otherwise, errors may occur again. Most of the problems may be caused by IO systems, user misoperations or anti-virus software.

Fix errors

Based on the CheckDB results, we can specify the corresponding repair policy. Here I will list some common errors and their repair methods.

This list lists the severity of errors in the ascending order of severity. Errors of each level are displayed with typical error prompts and solutions.

Inaccurate Space MetaData

Msg 2508, Level 16, State 3, Line 1
The In-row data RSVD page count for object "Broken1", index ID 0, partition ID 76911687695381, alloc unit ID 76911687695381 (type In-row data) is incorrect. Run dbcc updateusage.

If the index ID is 0, the partition ID is 1, and the allocation unit ID is 1 (type is), the count 1 is incorrect. Run dbcc updateusage.

This error indicates that the data page contains incorrect values in the reserved space. In SQL 2000, this error may also be caused by inconsistent numbers of data entries or data pages with indexes or table records.

CheckDB does not fix similar errors. in SQL 2005, CheckDB only provides one warning message.

This is not a serious error. Follow the prompts to run the dbcc updateusage line. This usually happens after SQL 2000 is upgraded to 2005/2008, and will not be encountered in SQL 2005/2008.

Msg 8914, Level 16, State 1, Line 1
Incorrect PFS free space information for page (1: 26839) in object ID 181575685, index ID 1, partition ID 293374720802816, alloc unit ID 76911687695381 (type LOB data ). expected value 0_PCT_FULL, actual value 100_PCT_FULL.

The PFS available space information on page 6 in Object ID 1, index ID 3, partition ID 3, allocation unit ID 4 (type 5) is incorrect. The required value is 7, but the actual value is 8.

This error indicates that the PFS Page (Page Free Space) contains incorrect values. It is not a serious error like an error. You can use the CheckDB + Repair_Allow_Date_Loss option to fix the error. If the error exists only in the database, the data will not be lost.

Upload uption only in the nonclustered indexes

If the error message ID returned by CheckDB is 2 or a number greater than 2, it indicates that the error part is a non-clustered index. Because non-clustered indexes can be regenerated from table data, there is no data loss to fix these errors.
If all errors returned by CheckDB are of this type, we recommend that you use the CheckDB Repair_Rebuild option to fix them.
Msg 8941, Level 16, State 1, Line 1
Table error: Object ID 181575685, index ID 4, page (3: 224866 ). test (sorted [I]. offset> = PAGEHEADSIZE) failed. slot 159, offset 0x1 is invalid.
Table error: Object ID 181575685, index ID 4, partition ID (3: 224866). Page (3: 224866). Test (sorted [I]. offset> = PAGEHEADSIZE) failed. The slot is 159, and the offset 0x1 is invalid.

Msg 8942, Level 16, State 1, Line 1
Table error: Object ID 181575685, index ID 4, page (3: 224866 ). test (sorted [I]. offset> = max) failed. slot 0, offset 0x9f overlaps with the prior row.
Table error: Object ID 181575685, index ID 4, partition ID (3: 224866) test (sorted [I]. offset> = max) failed. Slot 159, offset 0x1 and the previous line overlap.

These common errors can be solved by deleting and re-indexing. Online Index creation will read the original index to create a new index, so there will be problems with the new index. In this case, you can only delete the old one and then create a new one.

In fact, this is what the CheckDB + Repair_Rebuild option does, but you must use the single-user mode when using CheckDB. Therefore, manual reconstruction is very appropriate, so that the database can be kept online.

Upload uption in the LOB pages
Msg 8964, Level 16, State 1, Line 1
Table error: Object ID 181575685, index ID 1, partition ID 72057594145669120, alloc unit ID 72057594087800832 (type LOB data ). the off-row data node at page (rows 50), slot 0, text ID 901891555328 is not referenced.
Table error: Object ID 181575685, index ID 1, partition ID 72057594145669120, allocation unit ID 72057594087800832 (type: LOB ). The data nodes that are located on the page (,), slot 0, and text ID 901891555328 are not referenced.

This indicates that some LOB data nodes are not referenced by any row. This can be caused by corruption of clustered indexes, corruption of the heap structure, and reallocation of damaged pages.
If CheckDB returns these errors, running CheckDB Allow_Data_Loss will simply discard these pages and the corresponding data will become incomplete.

Data Purity errors
Msg 2570, Sev 16, State 3, Line 17
Page (1: 1103587), slot 24 in object ID 34, index ID 1, partition ID 281474978938880, alloc unit ID 281474978938880 (type "In-row data "). column "modified" value is out of range for data type "datetime ". update column to a legal value.
Page (1: 1103587), slot 24 is located In Object ID 34, index ID 1, partition ID 23432123423234, allocation unit ID 1234345667567875 (type: "In-row data. The value of column "modified" exceeds the range of data type "datetime. Update this column to a valid value.

This indicates that the data in some columns is out of the valid range of values. For example, it may be a value of the datetime type, but the minute count is greater than 1440, A Unicode value but its byte value cannot be divisible by 2. These errors will not be checked when SQL2000 is upgraded to a later version, we recommend that you use the Data_purity option to run CheckDB for one upgrade, but CheckDB cannot fix these errors (because the database cannot know how to fix these errors, is to manually modify its value, the difficult place is how to find the rows containing the error value, KB in this article has a more detailed discussion: http://support.microsoft.com/kb/923247
Corruption in the clustered index or heap

If the error occurs on the page or heap table of the clustered index, data is lost when the error occurs, because the page-level data in the clustered index stores the specific data, data is not redundant.

If this type of error is found in CheckDB, we recommend that you use the Repair_Allow_Data_loss option to run CheckDB.

The classic error information for such errors is generally:
Server: Msg 8976, Level 16, State 1, Line 2
Table error: Object ID 181575685, index ID 1, partition ID 76911687695381, alloc unit ID 76911687695381 (type In-row data ). page () was not seen in the scan although its parent (1: 479) and previous (1: 715544) refer to it.
Server: Msg 8939, Level 16, State 1, Line 2
Table error: Object ID 181575685, index ID 0, page (1: 168576 ). test (m_freeData> = PAGEHEADSIZE & m_freeData <= (UINT) PAGESIZE-m_slotCnt * sizeof (Slot) failed. values are 44 and 8028.

These are only some examples, and there are more errors. The main reason for determining such errors is that the Index ID is 0 or 1. If the Index ID returned by CheckDB is 0 or 1, the data in the base table is damaged.

This can also be fixed, but the cost of fixing these errors is the loss of data on some rows or even pages. It is worth noting that when the data is deleted, CheckDB does not check the foreign key and does not trigger the Delete trigger. This may cause integrity issues of references and logical data inconsistency. Therefore, the solution is not recommended.

If a complete database backup is saved, restoring from the backup will solve these problems. It will be better if there is any log backup to cut off the log chain, because it can be restored to the perfect state, no data is lost. (Improving the backup policy is the king of data security ~)

If there is no complete database backup, you only need to run CheckDB in Repair_allow_data_loss mode, which requires the database to run in single-user mode.

To know how much data is deleted in secondary mode in clustered index, you can view this log: http://sqlskills.com/BLOGS/PAUL/post/CHECKDB-From-Every-Angle-Using-DBCC-PAGE-to-find-what-repair-will-delete.aspx

Upload uption in the Metadata
Msg 3853, Level 16, State 1, Line 1
Attribute (object_id = 181575685) of row (object_id = 181575685, column_id = 1) in sys. columns does not have a matching row (object_id = 181575685) in sys. objects.
This type of error usually appears in a database upgraded from SQL 2000, where someone did direct updates to the system tables.

There is no mandatory primary/foreign key constraint in the SQL System table. Therefore, in SQL2000, it is entirely possible to delete a row (such as a table) from Sysobjects (in 05 and 08, for the sake of security, SQL prohibits direct operations on system tables and can only be performed through corresponding commands ), however, the Syscolumns and Sysindexes tables contain the columns and indexes of the table.

In SQL 2000, CheckDB is not expensive to Detect System directories, so such errors cannot be found. But in SQL 2005, CheckDB performs these checks. Therefore, only in SQL2005, these errors may occur.
It is not a simple task to fix these errors. CheckDB will not fix these errors. If a complete SQL2000 database backup was recently performed before the upgrade to 05, you can recover the data to SQL 2000 and then run the upgrade.

If there is no SQL 2000 backup, or the backup time is too long, there are only two possible ways to fix these errors. One is to directly edit the system table, which is a very complicated and dangerous operation, because the system table is not documented and the 05 system table is more complex than before, more details can be seen in this Blog: http://www.sqlskills.com/BLOGS/PAUL/post/TechEd-Demo-Using-the-SQL-2005-Dedicated-Admin-Connection-to-fix-Msg-8992-corrupt-system-tables.aspx; another solution is to generate and export a script to generate a database, create a database, and export the data of the original database.

Irreparable upload uption

CheckDB does not solve all problems. All data of this type can only be recovered through full backup.
In addition, you can only restore part of the data by generating a script. A typical error is as follows:

Msg 7985, Level 16, State 2, Line 1
System table pre-checks: Object ID 4. cocould not read and latch page () with latch type SH.
Check statement terminated due to unrepairable error.
Msg 8921, Level 16, State 1, Line 1
Check terminated. A failure was detected while collecting facts. Possibly tempdb out of space or a system table is inconsistent.

CheckDB requires some key system tables to determine the health status of the database. If these key tables are damaged, the CheckDB analysis error is an impossible task, not to mention repair.

Damaged allocation pages
Msg 8946, Level 16, State 12, Line 1
Table error: Allocation page (1: 2264640) has invalid PFS_PAGE header values. Type is 0. Check type, alloc unit ID and page ID on the page.
Msg 8998, Level 16, State 2, Line 1
Page errors on the GAM, SGAM, or PFS pages prevent allocation integrity checks in database ID 13 pages from (1: 2264640) to (1: 2272727)

When these errors occur, the storage ing pages of the database are damaged. These pages are used to mark those pages and zones on the hard disk that are used and not used, checkDB does not fix these errors, and these errors are tricky. It is not advisable to delete these pages because each page is mapped to 4 GB of storage data.

You are welcome to reprint it. If possible, please attach this annotation:
Fu Bo translated from: http://www.sqlservercentral.com/articles/65804/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.