Oracle Data block corruption and recovery specific explanation

Last Update:2017-06-12 Source: Internet

Author: User

Tags dba system log

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. What is block corruption:

The so-called corrupted data block refers to the block not being used in the recognizable Oracle format, or its contents are internally inconsistent.

Typically, corruption is caused by a hardware failure or an operating system problem. The Oracle database identifies the corrupted block as "logical corruption" or "media corruption."

If it is a logical corruption, it is an Oracle internal error.

After the Oracle database detects inconsistencies, the logically corrupted blocks are marked as corrupted. If the media is damaged, the block is not in the correct format; The block read from disk does not include meaningful information.

Either by restoring the block, or by deleting the database object that includes the damaged block (or both). The ability to repair blocks of damaged media.

Assuming that the media damage is caused by a hardware failure, only after the hardware failure to repair, the ability to completely solve the problem.

Just run a read or write operation on the block. The following consistency checks are run:
--Block version number

--The DBA (block address) value in the fast cache is the result of a comparison with the DBA value in the block buffer

--Block checksum (assuming enabled)

The damaged block is identified as the following category:

--Media damage

--Logical (or software) damage

2. Block damage Symptom: ORA-01578

ORA-01578 error: "ORACLE data block corrupted (file #%s, Block #%s)":

-Generate this information when a corrupted block is found

--Always return relative file number and block number

--Return to the session that issued the query (the query runs when corruption is found)

--displayed in the Alert.log file

Under normal circumstances. ORA-01578 errors are caused by a hardware problem.

Assuming that the ORA-01578 error always returns the same number of references, the most likely cause is block media corruption.

If the number of returned parameters changes every time, there may be a hardware problem. You should check memory and page space. And check the I/O subsystem to find the controller that has the problem.
Note: ORA-01578 returns the relative file number, but the accompanying ORA-01110 error displays the absolute file number.

3. How to Handle damage

--Check the alert log and operating system log files.

--Use the available diagnostic tools to find the type of damage.

--Perform the check function multiple times to determine if the error persists.

-based on need. Recovers data from a corrupted object.

--Solve the hardware problem:
Memory Bars,
Disk controller,
Disk

-based on need. Recovers or restores data from a damaged object.

Always try to determine whether the error persists. Executes the ANALYZE command multiple times, assuming it is possible to perform a shutdown and restart operation, and then try the previous problem again. Find out if there are other corruptions. Suppose a damaged block is found. Other corrupted blocks may also exist.

Hardware failures must be resolved immediately.

When you experience a hardware problem, you should contact the vendor to continue working after you have checked and repaired your computer. A comprehensive hardware diagnostic session should be performed at this time.

There may be many types of hardware failures:
--I/O hardware or firmware failure

--Operating system
--I/O or Fast cache issues

--Memory or paging issues

--Disk Repair useful program

4. Real-time Verification block integrity: db_block_checking:

Database block checking can be enabled by setting the db_block_checking initialization parameter to TRUE.

Just to change the block or index block, this check checks the internal consistency of the data block and index block.

Db_block_checking is a dynamic parameter. You can use the ALTER SYSTEM SET statement to modify this number of parameters. Block checking is always enabled for system table spaces. Block checks typically incur a 1% to 10% overhead, depending on the workload. The more updates or inserts you are running, the higher the overhead of running a block check. Db_block_checking has the following four possible values:

--off: Block checking is not run on all tablespaces except SYSTEM.

--low: After the contents of the block in memory have changed (for example,. After you run the UPDATE or INSERT statement and read on the running disk). Run the main size check.

--medium: Run all low checks to run a semantic block check on all blocks that are not organized by index.

--full: Run all low and MEDIUM checks to run a semantic check on the index block.

Initialize the number of parameters Db_block_checking:

--Control check processing when running a self-consistency check on each block

--Protects against memory and data corruption

--can be set using the Alter SESSION command or the ALTER SYSTEM DEFERRED command

5. Block Media recovery

Most of the cases. The first time a corruption is encountered, the database marks the block as media corruption. It is then written to disk.

The block cannot be run regardless of the read operation until it is restored.

Block recovery can only be run on blocks marked as corrupted or not checked for damage.

RMAN RECOVER can be used ... The block command runs the chunk media recovery. By default, RMAN searches for a good block copy in the flashback log. Then search for blocks in either a full backup or a 0-level incremental backup. Assume that RMAN finds a good copy. The copies are restored and the media is resumed on the block.

Block media recovery can only use redo logs for media recovery, not incremental backups.

The V$database_block_corruption view shows blocks marked as corrupted by database components such as RMAN commands, ANALYZE, DBV, SQL queries, and so on. This view adds the corresponding row for the following types of corruption:

--Physical/media corruption: The database does not recognize blocks: Checksum is invalid, the block content is all zero, or the size is incomplete. By default. Physical damage check is enabled.

--Logical corruption: the checksum of the block is valid, and the size and the end of the block are also matched. But the content is inconsistent. Block Media Recovery cannot repair logical block corruption. By default, the logical corruption check is disabled. By specifying the CHECK LOGICAL option for the BACKUP, RESTORE, RECOVER, and VALIDATE commands. Ability to enable logical corruption checking.

Block Media recovery:
--Reduced mean recovery time (MTTR)

--Increase availability during media recovery

--Data files remain online during recovery

--only the blocks that are recovering are inaccessible.

--Using RMAN RECOVER ... BLOCK command Invocation

--Restore blocks using flashback logs and full backups or level 0 backup

--using redo logs to run media recovery

--v$database_block_corruption view shows blocks marked as broken

6. Prerequisites for Block Media recovery

--The target database must be in ARCHIVELOG mode

--the backup of the data file including the damaged block must be a full backup or a level 0 backup.

--To use proxy replicas, you must first restore them to a nondefault location
--rman can only be recovered using archived redo logs

--To use the flashback log. You must enable the Flash back database

The following prerequisites apply to RECOVER ... BLOCK command:

--The target database must be executed in ARCHIVELOG mode. And it has to be open. or loaded using the current control file.

-A data file backup that includes a damaged block must be a full backup or a level 0 backup, not a proxy copy. Assume that only proxy copy backups exist. You can restore them to a nondefault location on disk. In such a case. RMAN will think of them as a copy of the data file, in which blocks are searched during block media recovery.

The--rman can only be recovered using archived redo logs.

RMAN cannot use a Level 1 incremental backup. Block media recovery cannot recover lost or inaccessible archive redo logs. However, it is sometimes possible to recover lost redo records.

--The Flashback database must be enabled on the target database. This allows RMAN to search the flashback log for a good copy of the damaged block. Assuming that flashback event logging is enabled and that this event record includes an older but undamaged version number of the corrupted block, RMAN can use these blocks, which may increase the speed of recovery.

7.RECOVER ... BLOCK command

--Determine the backup that includes the blocks to be recovered

--Read the backup and accumulate the requested block into the memory buffer

--When necessary. Manage block media recovery sessions by reading archive logs from backups

RECOVER datafile 6 BLOCK 3; Recover a single block

RECOVER RECOVER Multiple blocks

DataFile 2 BLOCK in multiple data files

DataFile 2 BLOCK 79

DataFile 6 BLOCK 183;

RECOVER corruption LIST; Recover all blocks logged in V$database_block_corruption

To recover a single block:

Before a block recovery, you must determine the damaged block. Under normal circumstances, block corruption is reported in the following location:

--list FAILURE, VALIDATE or BACKUP ... Results of the VALIDATE command
--v$database_block_corruption View

--error message in standard output

--Alert log files and user trace files (identified in the V$diag_info view)
--sql results of the ANALYZE TABLE and ANALYZE INDEX commands
--dbverify results of the useful program

For example, the following message may be found in the user trace file:

Ora-01578:oracle data Block Corrupted (file # 7, Block # 3)

Ora-01110:data file 7: '/ORACLE/ORADATA/ORCL/TOOLS01.DBF '

Ora-01578:oracle data Block corrupted (file # 2, Block # 235)

Ora-01110:data file 2: '/ORACLE/ORADATA/ORCL/UNDOTBS01.DBF '

--After determining the block, execute the RECOVER at the RMAN prompt ... BLOCK command. Specifies the file number and block number of the damaged block.

RECOVER

DataFile 7 BLOCK 3

DataFile 2 BLOCK 235;

8. Use the 10231 event to process:
(Block damaged but no backup, no way to reply to the case)

Run in Sqlplus such as the following command:
ALTER SYSTEM SET events= ' 10231 Trace name context forever,level 10 ';
Then export the table:
Exp Test/test file=t.dmp tables=t;
Delete the table in the database
drop table t;
and then import
Imp test/test file=t.dmp tables=t;
Last 10231 events are closed:
ALTER SYSTEM SET events= ' 10231 trace name context off ';

Oracle Data block corruption and recovery specific explanation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More