How Does Oracle mark Bad blocks and recover data once?

Source: Internet
Author: User
Tags ultraedit

Bad blocks of Oracle data files can be divided into physical and logical Bad blocks. Physical Bad blocks (or media Bad blocks) mean that the block format itself is bad, and the data in the block has no meaning. The logical bad block refers to the logic of data in the block. For example, the index values of an index block are not listed in ascending order. Physical Bad blocks are generally caused by memory problems, OS problems, IO subsystem problems, and hardware. Logical Bad blocks are generally caused by Oracle bugs.

Each block of an Oracle data file consists of 20 bytes. The definition is as follows: (from DSI401)

Struct kcbh
{
Ub1 type_kcbh;/* block type */
Ub2 frmt_kcbh;
Ub1 sparejavaskcbh;
Ub1 spare2_kcbh;
Krdba rdba_kcbh;/* relative DBA */
Ub4 bas_kcbh;/* base of SCN */
Ub2 wrp_kcbh;/* wrap of SCN */
Ub1 seq_kcbh;/* sequence # of changes at the same scn */
Ub1 flg_kcbh;
Ub2 chkval_kcbh;
};

In the block header, seq_kcbh (occupies 1 byte and the block header offset is 14) has a special meaning. If the value is 0xff, it indicates that the block is marked as uption.

Next we will perform a test:

SQL> create table test. t1 as select * from dba_objects;

The table has been created.

SQL> select header_file, header_block from dba_segments where segment_name = 'T1 'and owner = 'test ';

HEADER_FILE HEADER_BLOCK
-----------------------
10 1445

Modify the value of db_block_checksum to TRUE to close the database. Use ultraedit to modify the check sum (a random number> 0) and flag = 0 × 04 of file 10. Then open the database. Then execute the following query:

SQL> select count (*) from test. t1;
Select count (*) from test. t1
*
ERROR is located in row 1st:
ORA-01578: ORACLE data block corruption (file number 10, block number 1447)
ORA-01110: data file 10: 'd: \ ORACLE \ ORADATA \ XJ \ TEST01.DBF'

The checksum check is ignored when the db_block_checksum parameter is set to FALSE. Therefore, set this parameter to TRUE for test convenience.
From the above error message, the block number 1447 has been broken, the error is a classic ORA-01578 error.

We use dbv to check this file:

D: \ oracle \ oradata \ XJ> dbv file = TEST01.dbf blocksize = 2048

DBVERIFY: Release 9.2.0.1.0-Production on Monday February 23 17:20:43 2009

Copyright (c) 1982,200 2, Oracle Corporation. All rights reserved.

DBVERIFY-verification starting: FILE = TEST01.dbf
Page marked as corrupt 1447
***
Corrupt block relative dba: 0 × 028005a7 (file 10, block 1447)
Bad check value found during dbv:
Data in bad block-
Type: 6 format: 2 rdba: 0x028005a7
Last change scn: 0 × 0000. 0023b43e seq: 0 × 2 flg: 0 × 04
Consistency value in tail: 0xb43e0602
Check value in block header: 0xf0f0, computed block checksum: 0 × 3a4f
Spare1: 0x0, spare2: 0x0, spare3: 0x0
***

DBVERIFY-verification completed

Total number of checked pages: 56660
Total number of processed pages (data): 53947
Total number of failed pages (data): 0
Total number of processed pages (INDEX): 30
Total number of failed pages (INDEX): 0
Total number of processed pages (Others): 2669
Total number of processed pages (segments): 0
Total number of failed pages (segments): 0
Total number of blank pages: 13
Total number of pages marked as damaged: 1
Total number of imported pages: 0

Dbv check finds Bad blocks (check error ).
If you use the analyze command to check whether there are bad blocks:

SQL> analyze table test. t1 validate structure;
Analyze table test. t1 validate structure
*
ERROR is located in row 1st:
ORA-01578: ORACLE data block corruption (file number 10, block number 1447)
ORA-01110: data file 10: 'd: \ ORACLE \ ORADATA \ XJ \ TEST01.DBF'

We use dbms_repair to process this bad block (in fact, if the checksum is broken, we can change the checksum to the correct value. However, when the checksum is broken, it usually means that the data in the bad condition is broken and can only be discarded in most cases ):

SQL> begin
2 dbms_repair.admin_tables (
3 table_name => 'repair _ table ',
4 table_type => dbms_repair.repair_table,
5 action => dbms_repair.create_action,
6 tablespace => 'system ');
7 end;
8/

The PL/SQL process is successfully completed.
SQL> set serveroutput on
SQL> declare
2 rpr_count int;
3 begin
4 rpr_count: = 0;
5 dbms_repair.check_object (
6 schema_name => 'test ',
7 object_name => 't1 ′,
8 repair_table_name => 'repair _ table ',
9 corrupt_count => rpr_count );
10 dbms_output.put_line ('repair count: '| to_char (rpr_count ));
11 end;
12/
Repair count: 1

The PL/SQL process is successfully completed.
SQL> select object_name, block_id, corrupt_type, marked_corrupt, corrupt_description,
2 repair_description from repair_table;

OBJECT_NAME BLOCK_ID CORRUPT_TYPE MARKED_COR CORRUPT_DESCRIPTION REPAIR_DESCRIPTION
-------------------------------------------------------------------------------------
T1 1447 6148 TRUE mark block software
Corrupt

T1 1447 6148 TRUE mark block software
Corrupt
SQL> declare
2 fix_count int;
3 begin
4 fix_count: = 0;
5 dbms_repair.fix_corrupt_blocks (
6 schema_name => 'test ',
7 object_name => 't1 ′,
8 object_type => dbms_repair.table_object,
9 repair_table_name => 'repair _ table ',
10 fix_count => fix_count );
11 dbms_output.put_line ('fix count: '| to_char (fix_count ));
12 end;
13/
Fix count: 0

The PL/SQL process is successfully completed.
SQL> begin
2 dbms_repair.skip_corrupt_blocks (
3 schema_name => 'test ',
4 object_name => 't1 ′,
5 object_type => dbms_repair.table_object,
6 flags => dbms_repair.skip_flag );
7 end;
8/

The PL/SQL process is successfully completed.

SQL> select table_name, skip_corrupt from dba_tables where table_name = 'T1 'and owner = 'test ';

TABLE_NAME SKIP_COR
--------------------------------------
T1 ENABLED

SQL> select count (*) from test. t1;

COUNT (*)
----------
28762

SQL> alter system checkpoint;

The system has been changed.

We can see from the above that dbms_repair.fix_corrupt_blocks does not fix the checksum error, nor does it make a bad block mark. Dbv and ultraedit are used to check the block header. No changes are found. However, when you set the table to skip Bad blocks in the data dictionary through dbms_repair.skip_corrupt_blocks, this block is skipped during query.

If RMAN is used to back up the file and then restore the file, the seq_kcbh of the Bad block is set to 0xff. When dbv is used to check the file, the error message displayed is:

DBVERIFY-verification starting: FILE = TEST01.dbf

DBV-00200: Block, dba 41944487, marked as crash

DBVERIFY-verification completed

Total number of checked pages: 56655
Total number of processed pages (data): 53948
Total number of failed pages (data): 0
Total number of processed pages (INDEX): 30
Total number of failed pages (INDEX): 0
Total number of processed pages (Others): 2669
Total number of processed pages (segments): 0
Total number of failed pages (segments): 0
Total number of blank pages: 8
Total number of pages marked as damaged: 0
Total number of imported pages: 0

Note that the "mark as the total number of damaged pages" is different from the previous check. Here it is "0 ".

Note that using skip_corrupt_blocks only allows oracle to skip the blocks that can be read by Oracle. If the read call fails at the operating system layer, it cannot be skipped. Even the session may be interrupted. In this case, the copy (cp) command of the dd command or the operating system cannot copy the file, and rman cannot back up the file, what if the data file is not backed up?

One of our customers encountered this problem a few days ago. In windows, two-node RAC used OCFS. Due to storage and hard disk problems, one data file had bad blocks, the file cannot be copied from the operating system. This problem was also encountered in the past few months, but the system is RAC in Linux (it is difficult to solve the problem of OCFS? Both use OCFS ). Due to storage problems, the hard disk is highlighted with a yellow light, and the fault still exists after disk replacement. The database must be backed up urgently, but the file cannot be copied.

In this case, write a script to insert data into another table? Then exp? At the scene, we found that the table where the bad block is located had more than GB. Is there a simpler way? When I got to the client, I used more than 20 minutes to write a simple program to copy the file that could not be copied using the operating system tool. The principle is to read data in blocks and write data into a new file. When a block cannot be read, write a bad block (seq_kcbh is set to 0xff, if flag_kcbh is set to 0 × 04, a value is written to checksum, and all others are 0) to the new file. In this way, the file is copied. Fortunately, there are only two bad blocks in the whole file. After testing, the file is fully available.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.