One SQL Server 2000 repair Practice

Source: Internet
Author: User
Tags mssqlserver system log backup
Server I told a story of the background is this, in a certain POS project using SQL SERVER 2000 to do the foreground database, IBM's DB2 do background database. The environment for the foreground database is such that the operating system is Windows Server (USERS), and the database is SQL Server (E) +sp3,application is a POS system (a real-time trading system). Hardware configuration is: P4 xron 2.4g*2,36g hdd*5 RAID5, 1G memory,hp tape drive, the database capacity is generally kept at 5G or so.
Because the data is more important and the data capacity is small, the backup strategy we require is to do a full backup of the pos_db on the tape drive every day (a cycle of one weeks 7 days) and a full backup on the hard drive at night (master,msdb,pos_db). This keeps double insurance.

1. Failure outbreak:

2003-12-26 13:00

The customer reports that all Pos crashes and server runs very slowly. After restarting the server (starting the alert when checking the raid card) we found this information in the system log of Windeows server:

error:823, severity:24, State:2
I/O error (torn page) detected during read at offset 0x0000001bf96000 in file D: \data\pos_db.mdf '.
This information is available in the error log for SQL Server:
2003-12-10 03:34:22.23 spid56 error:823, severity:24, State:2
2003-12-10 03:34:22.23 spid56 I/O error (torn page) detected during read at offset 0x00000074964000 in file ' D:\DATA\POS_D B.mdf '..

An explanation from MSDN:

I/O logical check FAILURE:IF a read Windows API call/a write Windows API call to a database file is successful, but s Pecific logical checks on the "data are not successful" (a torn page, for example), and a 823 error is raised. The following error message was an example of a 823 error for a I/O logical check failure:
2003-09-05 16:51:18.90 spid17 error:823, severity:24, State:2
2003-09-05 16:51:18.90 spid17 I/O error (torn page) detected during read at offset 0x00000094004000 in file ' F:\SQLData\my Db. MDF '.
To resolve this problem, the "the" CHECKDB statement on the database, the ' is ' associated with the ' file in ' Erro R message. If the DBCC CHECKDB statement reports errors, correct those errors before you troubleshoot this problem. If the problem persists even after the DBCC CHECKDB errors have been corrected, or if the DBCC CHECKDB statement Errors, review the Microsoft Windows NT system event log for any system errors or disk-related errors. You can also contacts your hardware vendor to run any appropriate diagnostics.

I/O logic check failed: If a Windows program succeeds in reading and writing database files, but does not succeed in detailed data logic checking (for example, incomplete pages), SQL Server returns MSG 823 error. The following is an example of an I/O logic check failure msg 823:

2003-09-05 16:51:18.90 spid17 error:823, severity:24, State:2
2003-09-05 16:51:18.90 spid17 I/O error (torn page) detected during read at offset 0x00000094004000 in file ' F:\SQLData\my Db. MDF '.

To resolve this problem, first execute the DBCC CHECKDB (the database file for the error message prompt) in the database. If DBCC CHECKDB errors, correct them before you fix them. If these error messages are persisted until the execution of the DBCC CHECKDB run, or if DBCC CHECKDB does not report any errors, check the Event Viewer of the Windows NT system for information related to system errors or disk errors. You can also contact your hardware manufacturer to run the correct diagnostics tool.

Bad, there is a problem with the database file, and in checking the OS's Event Viewer, we found that there was an error message one weeks ago (the offset address is different).

Quickly check the HDD, it was found in RAID5 's first fast hdd lit a red light (too much dust, it is difficult to see)

Perform DBCC CHECKDB (' pos_db ') Check discovery:

Server:msg 8909, level, State 1, line 1
Table error:object ID 26342838, index ID 35207, page ID (1:50,978). The PageID in the page Header = (32230:-2048732002).

Server:msg 8939, level, State 1, line 1
Table error:object ID 859150106, index ID 255, page (1:238,770). Test (is_on (Buf_ioerr, Bp->bstat) && Bp->berrcode) failed. Values are 2057 and-1.

Server:msg 8928, level, State 1, line 1
Object ID 861246123, index ID 0:page (1:57291) could not is processed. Errors for details.

Server:msg 2511, level, State 1, line 1
Table error:object ID 862626116, Index ID 0. Keys out of Order on page (1:269310), slots 0 and 1.

Aha, so many of the tables have bad associations (please note the object ID of each error table).

Check from MSDN:

Error number MSG 823: SQL Server detected a problem with a hardware device or a system problem while reading and writing data.

Torn page: It means incomplete pages.

0x0000001bf96000: This is the number of bytes from the beginning of the data file to the torn PAGE.

Wrong number msg 8939: You can see: http://support.microsoft.com/default.aspx?kbid=320434
FIX: Bulk inserts with TABLOCK prompts (bulk inserts, BCP, etc.) can cause errors 8929 and 8965 when running CHECKDB.

Error number MSG 8928: Is the information associated with 8939,

Error number MSG 8965: Is the information associated with 8939,

You can find the relevant information at the following address:

http://support.microsoft.com/default.aspx?scid=kb;en-us;826433
Prb:additional SQL Server Diagnostics Added to detect unreported I/O Problems
http://support.microsoft.com/default.aspx?scid=kb;en-us;828339
Prb:error message 823 may indicate hardware problems or system problems
http://support.microsoft.com/default.aspx?scid=kb;en-us;308795
FIX:CHECKDB may not Fix Error 8909 or Error 8905

Fault diagnosis: RAID has a bad HDD, causing database file damage

2. Replace HDD

2003-12-28 23:00

Now embodies the benefits of RAID5, a bad hdd, the system can run as usual, but the system's log and SQL Server log still have MSG823 error information.

Follow the rebuild steps of the RAID card to bind the new HDD to the original RAID5 and complete it successfully.

To check the integrity of the database with DBCC

DBCC CHECKDB (' pos_db ') with ALL_ERRORMSGS

There is still a problem with the database file if you find that you have the same error information as before the HDD replacement.

There is a strange question 1, since it is the RAID5 of 5 hdd, why is there a piece of HDD bad affect the database file damage, puzzled?



3. Restoring the Database

2003-12-29 00:30

There is no way to recover a database with a backed-up dataset (it seems to be how important the backup is)

Use MASTER
Go
RESTORE DATABASE pos_db from disk= ' D:\DATABASEBACKUP\POS_DB_BACKUP. DAT '

Restart the MS SQL Sercver service.

NET STOP mssqlserver/net START MSSQLServer

To check the integrity of the database with DBCC

DBCC CHECKDB (' pos_db ') with ALL_ERRORMSGS

Consistent with the error message before recovery, unchanged.

--Strange problem 2,sqlserver backup does not verify the integrity of the database, and the full backups of the database are problematic. Angry!!

It seems that the database can only be repaired by means of a tool (-the number of records of the error table logged before you modify it to compare after the database is repaired).

Running in Query Analyzer:

ALTER DATABASE pos_db SET singl_user
Go
DBCC CHECKDB (' pos_db ', repair_allow_data_loss) with TABLOCK
Go
ALTER DATABASE pos_db SET multi_user
Go

CHECKDB has 3 parameters:

Repair_allow_data_loss

Perform all repairs performed by repair_rebuild, including assigning and unassign rows and pages to correct assignment errors, structural rows or pages, and deleting corrupted text objects. These fixes can cause some data loss. A repair operation can be completed under a user transaction to allow the user to roll back the changes. If you roll back the repair, the database will still contain errors and should be recovered from the backup. If an incorrect fix is omitted due to the level of repair provided, any fixes that depend on the repair will be omitted. After the repair is complete, back up the database.
  
Repair_fast for small, time-consuming fixes, such as fixing additional keys in a nonclustered index. These fixes can be done quickly, and there is no risk of losing data.

Repair_rebuild performs all the repairs performed by Repair_fast, including the need for a longer period of repair, such as rebuilding an index. There is no risk of losing data when performing these repairs.

The first time we run, we will find:

DBCC results for ' table_name '.
There are 1 rows in 1 pages for object ' table_name '.
The error has been repaired.
CHECKDB found 0 allocation errors and 1 consistency errors in table ' (Object ID 26342838) ' (Object ID 26342838).
CHECKDB fixed 0 allocation errors and 1 consistency errors in table ' (Object ID 26342838) ' (Object ID 26342838).

There are a lot of such information, and there are "the error has been repaired" tips. But in the end there is this message:

CHECKDB found 0 allocation errors and consistency errors in database ' pos_db '.
CHECKDB fixed 0 Allocation errors and consistency errors in database ' pos_db '.

Run again, or have the same error. Bad: =) It seems that this method is unable to repair such errors.

Failed!!!

Take a closer look at the SQL SERVER Bol found CHECKDB also has a very useful parameter physical_only

Physical_only

Only check the integrity of the physical structure of the page and record headers, and the consistency between the Page object ID and the index ID and the allocation structure. This check is designed to check the physical consistency of the database at a lower cost, while also detecting torn pages and common hardware failures that can compromise user data security. Physical_only always means no_infomsgs and cannot be used with any repair options.

Run again:

DBCC CHECKDB (' pos_db ') with No_infomsgs,physical_only
  
And then run:

DBCC CHECKDB (' pos_db ', repair_allow_data_loss) with TABLOCK

This time, you will return some 8952.8956 error messages:

Server:msg 8952, level, State 1, line 1
Table error:database ' pos_db ', index ' pos_refer. Idx2_pos_refer ' (ID 861246123) (index ID 2). Extra or invalid key for the keys:

Server:msg 8956, level, State 1, line 1
Index Row (1:26315:23) with values (plu_id = ' 6922825200240 ' and prd_aggr_id = 10006 and evnt_id = NULL and rgst_mde = 0 A nd SUBPRD_NBR = 0 and str_id = and prd_aggr_id = 10006 and SUBPRD_NBR = 0 and str_id = and plu_id = ' 6922825200240 ' and evnt_id = NULL and Rgst_mde = 0) points to the "data row identified by" ().

According to the instructions on MSDN:

This problem does is not cause any data or index corruption. The problem is in the metadata which are corrected only by dropping and re-creating the indexes.

These issues do not cause corruption in data or indexes, and the metadata for these issues is correct, but only deleted and then indexed.

It seems that the problem has been modified.

Run DBCC CHECKDB (' pos_db ') again, run again: DBCC CHECKDB (' pos_db ') with no error message.

Successfully repaired.

4. Check the repaired database and back up the database

Check the related tables for DBCC CHECKDB errors and compare the number of records before DBCC, and find a table with 40 missing records. Depressed.

5. Summary

1.RAID5 does not guarantee the integrity of the data files for SQL Server 2000 databases;

2.SQLERVER 2000 of the Backup program does not verify the data integrity of the database file, if you have problems with the data file, the backup is not illustrated;

3.DBCC checkdb Repair_allow_data_loss is not very safe to fix all errors, even if the incomplete page (torn page) will be the restoration of data loss;

4.DBCC checkdb repair_allow_data_loss parameters cannot fix all errors;



Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.