Practical experience in restoring sqlserver2000 Databases

Source: Internet
Author: User
Tags bulk insert ibm db2 mssqlserver

Practical experience in restoring sqlserver2000 Databases

**************************************** ****************************************

Author: Bright Summit of Huangshan

Mail: leimin@jxfw.com

Version: 1.0.0

Date: 2004-1-30

(If You Need To reprint it, please indicate the source !, If you have any questions, please mail me :-))

**************************************** ***************************************

The background of my story is that in a pos project, sqlserver 2000 is used as the front-end database, and IBM DB2 is used as the back-end database. The operating system of the front-end database is Windows2000 Server (10 users), and the database is sqlserver2000 (e) + SP3, application is the POS cash register system (a real-time transaction system ). The hardware configuration is: P4 xron 2.4g * g HDD * 5 RAID 5, 1g memory, HP dds4 tape drive, the database capacity is generally kept at about 5 GB.
Because the data is important and the data capacity is not big, the backup policy we require is to make full backups of pos_db on the tape drive every day (7 days a week ), back up all data on the hard disk at night (master, MSDB, pos_db ). in this way, double insurance is maintained.

1. Fault Outbreak:

The customer reported that all POS crashes and servers run very slowly. After restarting the server (an alarm is triggered when the RAID card is checked), we find that the following information is found in the "System Log" of windeows 2000 Server:
Error: 823, severity: 24, state: 2
I/O error (torn page) detected during read at offset 0x0000001bf96000 in file D:/data/pos_db.mdf '.
The "error log" of sqlserver contains the following information:
2003-12-10 03:34:22. 23 spid56 error: 823, severity: 24, state: 2
03:34:22. 23 spid56 I/O error (torn page) detected during read at offset 0x00000074964000 in file 'd:/data/pos_db.mdf '..
Explanations from msdn:
I/O logical check failure: If a read Windows API call or a write Windows API call for a database file is successful, but specific logical checks on the data are not successful (a torn page, for example), an 823 error is raised. the following error message is an example of an 823 error for an I/O logical check failure:
2003-09-05 16:51:18. 90 spid17 error: 823, severity: 24, state: 2
16:51:18. 90 spid17 I/O error (torn page) detected during read at offset 0x00000094004000 in file 'f:/sqldata/mydb. MDF '..
To resolve this problem, first run the DBCC checkdb Statement on the database that is associated with the file in the error message. if the DBCC checkdb statement reports errors, correct those errors before you troubleshoot this problem. if the problem persists even after the DBCC checkdb errors have been corrected, or if the DBCC checkdb statement does not report any errors, review the Microsoft Windows NT System Event Log for any system errors or disk-related errors. you can also contact your hardware vendor to run any appropriate diagnostics.
I/O logic check failed: If a Windows program is successful in reading and writing database files, but the detailed data logic check fails (for example: ), sqlserver returns the MSG 823 error. The following is an I/O logic check failure MSG 823 instance:
2003-09-05 16:51:18. 90 spid17 error: 823, severity: 24, state: 2
16:51:18. 90 spid17 I/O error (torn page) detected during read at offset 0x00000094004000 in file 'f:/sqldata/mydb. MDF '..
To solve this problem, you must first execute DBCC checkdb in the database (the database file with the error message ). If DBCC checkdb reports errors, you can correct these errors before fixing them. If these error messages are retained until DBCC checkdb is executed, or DBCC checkdb does not report any errors, check the event viewer of Windows NT for information related to system errors or disk errors. You can also contact the hardware vendor to run the correct diagnostic tool.

Bad:-(there is a problem with the database file. when checking the OS Event Viewer, we found that there was an error message one week ago (except that the offset address is different ).

Check the HDD and find that the first HDD in RAID5 has a red light (too much dust, it is difficult to see clearly)

RunDBCC checkdb ('pos _ db ')Check found:
Server: MSG 8909, level 16, state 1, line 1
Table error: Object ID 26342838, index ID 35207, page ID (1: 50978). The pageid in the page header = (32230:-2048732002 ).

Server: MSG 8939, level 16, state 1, line 1
Table error: Object ID 859150106, index ID 255, page (1: 238770 ). test (is_on (buf_ioerr, BP-> bstat) & BP-> berrcode) failed. the values are 2057 and-1.

Server: MSG 8928, level 16, state 1, line 1
Object ID 861246123, index ID 0: page (1: 57291) cocould not be processed. See other errors for details.

Server: MSG 2511, level 16, state 1, line 1
Table error: Object ID 862626116, index ID 0. Keys out of order on page (1: 269310), slots 0 and 1.

Aha, many tables have error Associations (Please record the Object ID of each error table)
From msdn:
Error Code MSG 823: indicates that sqlserver detects a hardware device or system error when reading or writing data.
Torn page: incomplete page
0x0000001bf96000: the number of bytes from the data file to the torn page.
Error No. MSG 8939: Can you see: http://support.microsoft.com/default.aspx? Kbid = 320434
Fix: when running checkdb, large-capacity insertion (bulk insert, BCP, etc.) with tablock prompts may cause errors 8929 and 8965
Error Code MSG 8928: Is the information associated with 8939,
Error Code MSG 8965: Is the information associated with 8939,

You can find the relevant information at the following address:
Http://support.microsoft.com/default.aspx? SCID = KB; en-US; 826433
PRB: Additional SQL Server diagnostics added to detect unreported I/O Problems
Http://support.microsoft.com/default.aspx? SCID = KB; en-US; 828339
PRB: error message 823 may indicate hardware problems or system problems
Http://support.microsoft.com/default.aspx? SCID = KB; en-US; 308795
Fix: checkdb may not fix error 8909 or error 8905

Fault Diagnosis: raid has a bad HDD, causing damage to database files

2. Change HDD

Now we can see the benefits of RAID5. If an HDD is broken, the system can run as usual. However, the system log and SQL server log still have msg823 error messages.
Bind the new HDD to the original RAID5 according to the rebuild steps of the RAID card, and the new HDD is successfully completed :-)
Use DBCC to check database integrity
DBCC checkdb ('pos _ db') with all_errormsgs
The error message is the same as before HDD replacement. It seems that the database file is still faulty.

-- There is a strange problem 1. Since it is RAID5 of five hdds, why does a bad HDD affect the damage of database files ??? :-(

3. Restore the database

There is no way to recover the database with the backup data set (it seems how important backup is)
Use master
Go
Restore database pos_db from disk = 'd:/databasebackup/pos_db_backup.dat'
Restart the mssqlsercver service,
Net stop MSSQLServer/Net start MSSQLServer
Use DBCC to check database integrity
DBCC checkdb ('pos _ db') with all_errormsgs

The error message is consistent with that before the restoration, and has not changed.
-- Strange problem 2: SQL Server backup does not verify the integrity of the database, and the full backup of the database is actually faulty. Angry !!

It seems that the database can only be repaired through tools (-- the number of records in the error table before modification, so that the database can be repaired and compared ).
Run the following command in the query Analyzer:
Alter database pos_db set singl_user
Go
DBCC checkdb ('pos _ db', repair_allow_data_loss) with tablock
Go
Alter database pos_db set multi_user
Go

Checkdb has three parameters:
Repair_allow_data_loss
Execute all repairs completed by repair_rebuild, including allocating and unassigning rows and pages to correct allocation errors, structure row or page errors, and deleting corrupt text objects. These fixes may cause some data loss. The repair operation can be completed under the user transaction to allow the user to roll back the changes. If the rollback is fixed, the database will still contain errors and the backup should be restored. If an error fix is missing due to the severity of the severity level provided, it will omit any fix that depends on the fix. After the restoration, back up the database.
Repair_fast Performs small and time-consuming repair operations, such as fixing additional keys in non-clustered indexes. These repairs can be completed quickly without the risk of data loss.
Repair_rebuild performs all repairs completed by repair_fast, including repairs that require a long period of time (such as re-indexing ). There is no risk of data loss when performing these repairs.

 

During the first running, we will find that:
DBCC results for 'table _ name '.
There are 1 rows in 1 pages for object 'table _ name '.
The error has been retries red.
Checkdb found 0 allocation errors and 1 consistency errors in table '(Object ID 26342838)' (Object ID 26342838 ).
Checkdb fixed 0 allocation errors and 1 consistency errors in table '(Object ID 26342838)' (Object ID 26342838 ).
There is a lot of such information, and there is a "the error has been retries red" prompt. However, there is still such information at the end:
Checkdb found 0 allocation errors and 19 consistency errors in database 'pos _ db '.
Checkdb fixed 0 allocation errors and 19 consistency errors in database 'pos _ db '.
Again, the same error still exists. Bad: =) it seems that this method cannot fix such a test error.

Failed !!!

Take a closer look at Bol BOL and find that checkdb has a very useful parameter physical_only.

Physical_only
Only checks the integrity of the physical structure of the page and record title, and the consistency between the Page Object ID and index ID and the allocation structure. The purpose of this check is to check the physical consistency of the database at a low cost. It also detects incomplete pages and common hardware faults that may compromise user data security. Physical_only always means no_infomsgs and cannot be used with any repair options.

Run again:
 DBCC checkdb ('pos _ db') with no_infomsgs, physical_only
Then run:
 DBCC checkdb ('pos _ db', repair_allow_data_loss) with tablock
This time, some 8952.8956 error messages will be returned:
Server: MSG 8952, level 16, state 1, line 1
Table error: Database 'pos _ db', index 'pos _ refer. idx2_pos_refer '(ID 861246123) (index ID 2). Extra or invalid key for the keys:


Server: MSG 8956, level 16, state 1, line 1
Index row (1: 26315: 23) with values (plu_id = '2013' and prd_aggr_id = 6922825200240 and evnt_id = NULL and rgst_mde = 0 and subprd_nbr = 0 and str_id = 12 and prd_aggr_id = 10006 and subprd_nbr = 0 and str_id = 12 and plu_id = '2013' and evnt_id = NULL and rgst_mde = 0) points to the data row identified ().

According to the description on msdn:
This problem does not cause any data or index upload uption. The problem is in the metadata which is corrected only by dropping and re-creating the indexes.
These problems do not cause data or index corruption. The metadata of these problems is correct, but they are deleted and re-indexed.
It seems that the problem is modified.

Run DBCC checkdb ('pos _ db') again and DBCC checkdb ('pos _ db') again. The message has no error message.

OK. fixed the problem :-)

4. Check the repaired database and back up the database.
Check the table related to DBCC checkdb errors and compare the number of records before DBCC is executed. It is found that 40 records are missing in one table. Depressing:-<

5. Summary

1. RAID5 does not guarantee the integrity of data files in the sqlserver 2000 database;
2. The sqlerver 2000 backup program does not verify the data integrity of the database files. If your data files have problems, they are not shown during Backup;
3. The repair_allow_data_loss of DBCC checkdb is not very secure and cannot fix all the errors. Even the repair of the incomplete page (torn page) may result in data loss;
4. The repair_allow_data_loss parameter of DBCC checkdb cannot fix all errors;

References:
Http://support.microsoft.com/default.aspx? SCID = KB; en-US; 298806
Http://support.microsoft.com/default.aspx? SCID = KB; en-US; 284440
Http://support.microsoft.com/default.aspx? Kbid = 320434
Http://support.microsoft.com/default.aspx? SCID = KB; en-US; 828339
Http://support.microsoft.com/default.aspx? SCID = KB; en-US; 308795
Http://support.microsoft.com/default.aspx? SCID = KB; en-US; 826433

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.