Understanding of the Oracle database bad block

Last Update:2016-10-01 Source: Internet

Author: User

Tags log log

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Physical bad blocks and logical bad blocks

There is a concept in the database called data block consistency, Oracle's data block consistency includes two levels: physical consistency and logical consistency, if a data block at these two levels of inconsistency, it corresponds to the physical and logical bad blocks we want to say today.

There is a checksum field at the head of each block, and Oracle recalculates the checksum of the block every time the block is written back to the disk, and logs the field to the final write disk. The next time the data block is read into memory, Oracle recalculates the checksum of the chunk and compares it to the field of the block, and if there is a difference, Oracle knows that the block has errors and will report a ORA-1578 error. A check by a checksum field is a physical consistency check that allows Oracle to check for all underlying errors, including disk, storage, and IO subsystems, so it is called a physical consistency check.

In Oracle, whenever a block of data is to be modified, a consistency check is made on the contents of the block (such as checking that the records are locked by food that does not exist, whether the records and indexes correspond, etc.). If this consistency check fails, a ORA-600 internal error is thrown. When Oracle checks the logical consistency of the data block, it attempts to Cache the block of data, and if the block is not restored to a consistent state, Oracle will flag the block as software corrupt, and when a query accesses the block, the Recovery The previous ORA-1578 error will also be thrown.

The physical consistency check uses the checksum field to focus primarily on checking for hardware failures and not about the correctness of the content, while the logical consistency check is focused on content checking, which is much more complex than checksum checking. Logical consistency checks are more content than physical consistency checks.

2. Causes of bad blocks

There are a variety of reasons for data block corruption, but due to physical reasons, it can also be caused by human-caused or Oracle bugs. More common human-caused abnormal shutdown, power-down, termination of the service so that the process terminates abnormally, thus destroying the integrity of the data, resulting in the generation of bad blocks. When the cause of block corruption occurs, the most typical behavior is encountering ORA-01578 errors (such as when a full table is scanned), and sometimes ORA-600 internal errors.

3. Check the consistency of the data blocks

(1) Check data block consistency with initialization parameters

Enable, turn off physical consistency checks by setting the initialization parameter db_block_checksum=true/false, enabling this parameter increases the load on 1%~2%, which is recommended by Oracle. In Oracle9i, 10g, the default is enabled.

By setting the parameter db_block_checking=true, you can enable logical consistency checking of the data block. Enabling this parameter increases the load on the 1%~10%, especially if the DML operation is more dense and the performance impact is greater. The default value of this parameter in oracle10g is false, which means that the database only checks the system table space for logical consistency.

(2) DBV Tool for Data block physical/logical check

Details: http://book.51cto.com/art/201108/287893.htm

(3) Analyze command Check data block consistency

Analyze command has a lot of usage, can complete a lot of functions, specifically, refer to: http://blog.csdn.net/tianlesoftware/article/details/7055164

Analyze command Check object consistency: Analyze table tablename Validate structure cascade online (offline);

The command also performs physical and logical consistency checks. In addition, it checks the matching of table and index entries, checks if the records of partitioned tables are in the correct partition, and checks for problems that are placed in the trace file of the User_dump_dest directory.

Use this command to note a few questions:

* When checking the partition table records are in the correct partition, you can record the rowid of the records in a special table, this table is called invalid_rows, the table needs to be established in advance with Utlvalid.sql script, check the syntax: Analyze tablename Validate structure into invalid_rows;

* If the index is checked, this command checks the data block and index consistency, but does not confirm that each record has a corresponding index entry or that the index has corresponding records, if this check is required, the CASCADE keyword is used.

* If you use the online clause, you can check for consistency on-line during DML operations, but the statistics for the object are not collected at this point, and if you use the offline table it will be locked.

(4) Use the Rman tool to check

* Check Data file
Backup check logical validate [database];

* Check individual data files
Backup check logical Validate DataFile 1, 2;

* Check the entire library
The results of the backup check logical validate database check are placed in the v$database_block_corruption dynamic view.

* This command has a certain effect on the system performance.

(5) Perform the check using the Dbms_repair.check_object method
For more information, refer to: http://blog.itpub.net/8494287/viewspace-1357457/

(6) Export entire database with EXP tool to detect bad blocks

The Export command will report a ORA-01578 error in execution, which will prompt the file number in the error prompt and which block in the file is damaged, such as Ora-01578:oracle block corruption (file number 4, block number 35). The use of the Exp tool to check the bad block is also limited, for some types of bad blocks can not be detected such as: Hwm above the bad block is not found, the index of the existence of the bad block is not found, the data dictionary of bad blocks will not be found.

4. Using bbed to simulate bad blocks

Http://www.cnblogs.com/myrunning/p/4433150.html

5. Recovery Theory of bad blocks

For data blocks that have inconsistent data blocks, if the current database is backed up and in archive mode, the data block can be recovered using the Rman tool block recovery, which is the simplest and most effective, and can be done while the data file is online without data loss. For a database that does not have a backup, data block corruption may occur, data loss or data is not lost, depending on the object where the bad block is located, such as the index block corruption, the data will not be lost, rebuild the index can be, the occurrence of data loss occurs on the table or partition table data block.

5.1. Recovery method without data loss

Using the Rman tool:

Blockrecover datafile XX block yy;--repair a single bad block

Blockrecover corruption list;--repair all bad blocks

If the table on the bad block is not updated recently, you can also use the bbed Copy command to copy a block of data from a recent backup, specifically not shown.

5.2. Recovery methods that may lose data

5.2.1 Confirm bad block type

When a bad block occurs, we first need to confirm what kind of block this block is, the data block of the normal table (the block on the partition table), the index block, and so on, we will confirm the different recovery methods for different bad block types.

<1> Determine the file number

Oracle has two data file numbers to identify data files. Absolute file Number (Absolute file# AFN) and relative file number (Relative file# RFN). AFN is the number of the data file throughout the system, and RFN is the number of the data file within the scope of the tablespace. Two files may have the same RFN, but will not have the same ASN. The file number that is prompted in ORA-01578 is RFN, the file number that is prompted in ORA-1110 is AFN, the file number that is prompted in ORA-600 is AFN, the file number that is seen from the view, and the data block number belongs to which segment uses AFN.

Sql> Select tablespace_name,file_id "AFN", Relative_fno "RFN"  2 from  dba_data_files; Tablespace_name                       AFN        RFN--------------------------------------------------USERS                                   4          4SYSAUX                                  3          3undotbs1                                2          2SYSTEM                                  1          1EXAMPLE                                 5          5livan_tbs                               6          selected rows.

Sql> Select tablespace_name,file_id "AFN", Relative_fno "RFN"  2 from  dba_temp_files; Tablespace_name                       AFN        RFN--------------------------------------------------TEMP                                    1          1

--Using AFN to query files
Select name from V$datafile where file#=&afn;

<2> confirm the type of corrupted data block

Set lines 120col segment_name for A20col partition_name for a20col owner for A10select segment_name,partition_name,segment _type,owner,tablespace_namefrom sys.dba_extentswhere File_id=&afnand &bad_block_id between block_id and block _id + blocks-1;

5.2.2 to set different recovery strategies according to different types

<1> User Table

The data block of the damage belongs to the user table, the data loss is unavoidable, the need to do is to save the data in other data blocks, and the data of the damaged data block is lost permanently. The recovery strategy at this point is to reduce the user's loss as much as possible. (If this data loss is unacceptable, you must consider other special means to try to recover the data in the damaged data block.) We can do this in two ways:
* Discard data on bad blocks, salvage data on other data blocks, and reconstruct objects;
* Mark the data fast as skiped, then let go of the block, which is to tag the chunk so that all operations will skip the block later.

The subsequent processing of the associated object is also considered when processing the bad block of the data table:

--Index

Select Owner,index_name,index_type
From Dba_indexes
where table_owner= ' &owner ' and table_name= ' &semeng_name ';

If the record is lost because of the rapid corruption of the data, the indexes are invalidated, should be deleted, and then rebuilt.

--PRIMARY key

Oracle implements a PRIMARY KEY constraint by creating a unique index on the primary key field, so the primary key is treated as if it were indexed, and the unique index is rebuilt.

Select Owner,constraint_name,constraint_type,table_name
From Dba_constraints
where owner= ' &owner ' and table_name= ' &segment_name '
and constraint_type= ' P ';

--a foreign key pointing to the primary key

The foreign key that points to the primary key is queried, the primary key is the same as the foreign key processing method, and the index needs to be rebuilt.

Select Owner,constraint_name,constraint_type,table_name
From Dba_constraints
where r_owner= ' &owner ' and r_constraint_name= ' &constraint_name ';

<2> partition Table

The partition table is treated like a normal table, confirming which partition the data belongs to, and then swapping the damaged partition through Exchange operations and an empty table of the same table structure, without affecting the program's normal operation, and then treating it the same way as the normal table.

<3> Index

If the damaged object is indexed, it is possible to delete the index rebuild directly.

For normal table index rebuilds you need to use the ALTER index ... rebuild online syntax, if you do not have the online keyword, the index is rebuilt based on the old index (which contains the bad block). For partitioned table indexes, you can use the ALTER index XXX rebuild partition YYY statement because the rebuild of the partitioned index does not use the old index, so you can not add the online keyword.

The index bad blocks note the following:

---confirm the table to which the index belongs

Select Table_owner,table_name
From Dba_indexes
where owner= ' &owner ' and index_name= ' &segment_name ';

--If the partition index, view belongs to the partition

Select Partition_name
From Dba_extents
where File_id=&afn
and &bad_block_id between block_id and block_id + blocks-1;

--Verify that the index is used for a constraint

Select Owner,constrint_name,constraint_type,table_name
From Dba_constraints
where owner= ' &table_owner '
and Constraint_name= ' &index_name ';

--If the index is a primary key index, verify that there is a foreign key pointing to the index

Select Owner,constraint_name,constraint_type,table_name
From Dba_constraints
where r_owner= ' &owner ' and r_constraint_name= ' &constraint_name ';

5.2.3 Recovering data

Recovering data is all about saving data from other good chunks of data. In short, the hand-loss data block is identified as "software corrupt", and then the Oracle does a full table scan to skip these blocks instead of throwing an exception, thereby passing through the CTAs (CREATE table xx as SELECT ...) method to salvage the data.

To skip these blocks when Oracle does a full table scan (provided the "software corrupt" tag is first labeled), you can use either of the Dbms_repair packages or the 10231 event.

If a ORA-1578 error is encountered, the data block is already identified as "software corrupt". If there are other errors such as ORA-600, the corrupted data block is not yet identified as "software corrupt".

Check to see if the data is bad fast or if it is marked as "software corrupt" and can be dbms_repair. The Check_object method examines the Repair_table table, and if the data block has been identified as "software corrupt", repair_table table Marked_corrupt is listed as True,repair_ The description column has no content. If the data block is not identified as "software corrupt", the repair_table Table marked_corrupt column false,repair_description will prompt "Mark block software Corrupt ", that is, the data block needs to be identified as" software corrupt ".

Once the block is identified as "software corrupt", the Dbms_repair.skip_corrupt_blocks or 10231 event can be used to make a
Flag, represents the scan of the object when the "software corrupt" data block can be skipped directly, instead of throwing a ORA-01578 error termination, so that all good data blocks can be scanned to achieve salvage data. From the View Dba_tables.skip_corrupt field, you can see whether the table has this flag set.

# # #dbms_repair方法

Set flag: Exec Dbms_repair.skip_corrupt_block (<schema>,<tablename>);
When you finish setting up the identity and then scan to the data, you will not be prompted with an error, but an entry will be logged in the log log:
Table Scan:segment file# 6 block# 4 skipping corrupt block file#6 block# 5
Clear flag: Exec Dbms_repair.skip_corrupt_block (<schema>,<tablename>,flags=dbms_repair.noskip_flag);

# # #使用10231事件

Session level setting: Alter session SET events ' 10231 Trace name context forever,level 10 ';

Set by initialization parameters at the instance level: event= "10231 Trace name context forever,level 10";

5. Bad block Recovery instance

Http://www.cnblogs.com/myrunning/p/4525917.html

Reference: Zhang Xiaoming << Oracle RAC cluster High-availability backup and recovery >>
http://blog.itpub.net/25472150/viewspace-688629/
http://blog.itpub.net/8494287/viewspace-1357457/
Http://www.cnblogs.com/sumsen/archive/2013/01/20/2868740.html

Understanding of the Oracle database bad block

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More