Oracle internal recovery principle (instance recovery)

Source: Internet
Author: User
When the Oracle kernel finds that an instance is dead and the corresponding thread opens the status bit in the control file, it will automatically restore the instance.

When the Oracle kernel finds that an instance is dead and the corresponding thread opens the status bit in the control file, it will automatically restore the instance.

Instance recovery is used to recover failed instances or failed instances in the parallel server environment, therefore, instance recovery can refer to both crash recovery and instance recovery in the parallel server environment (one or more failed instances can be recovered as long as one surviving instance ).

The target of instance recovery is to restore the data blocks of failed instances in the data buffer and disable the running threads. Only online archiving logs and current online data files are used for instance recovery (historical backup is not required ). The instance can recover only one thread at a time, starting from the latest thread checkpoint until the end of the thread.

Series of articles: Oracle internal recovery principles? Where = nkey & keyword = 19824

5.1 check whether instance recovery is required

When the Oracle kernel finds that an instance is dead and the corresponding thread opens the status bit in the control file, it will automatically restore the instance. The instance is automatically restored in the following two situations:

1. Open the database for the first time after the crash fails.

2. Some instances (not all) of the Parallel Server fail.

In a parallel server environment, a surviving instance must be recovered if one or more instances fail to be detected through the following methods:

1. When a foreground process of the active instance reads blocks in the data file into the data buffer, it detects "Invalid block lock ". This occurs when another instance has read the block into the data buffer and locked the block to protect the 'dirty data'. Then the instance fails.

2. the foreground process of the active instance notifies its SMON process to view the failed instance.

3. The active instance can apply for a dead instance thread to open the flag lock to find that the instance has died.

The smon process of the active instance obtains the list of dead instances and the list of wrong data blocks. Note: After the instance recovers, the locks in these lists are cleared.

5.2 Thread-at-a-Time Redo Application

Only one thread can be processed at a time when the instance is restored, so only one instance can be recovered at a time. Before processing the next thread, instance recovery will apply all the redo logs of each thread (from the thread checkpoint to the end of the thread) to the data file. The correctness of this algorithm depends on that only one instance can modify the blocks in the data buffer at the same time. This block is written back to the disk when the same block is modified for different instances. Therefore, when the instance is restored, only one thread is required to read the data cache block from the disk. That thread contains the latest modification log of the block.

Instance recovery can always be completed as long as the online log of this thread. Crash recovery first processes the thread with the lowest thread checkpoint and restores the thread in the ascending sequence of the thread checkpoint SCN. This ensures that the database checkpoint is promoted by every recovered thread.

5.3 Current online data file

Checkpoint counters are used to verify that data files are currently online data files rather than historical backups. If the data file is restored from the backup, you must restore the media first.

When the data file is restored from the backup, even if the online redo log can be restored, the media recovery is still inevitable. The reason is that crash recovery applies the redo logs after the thread checkpoint when processing each thread. Crash recovery can use this redo algorithm because each block only requires redo logs of up to one thread.

However, when restoring a backup, you cannot determine which threads to redo the log. Therefore, a single thread algorithm does not work in this case. To recover a backup, you need to merge the redo operations of multiple threads. For example, all the redo logs after the data file checkpoint are merged into the redo logs of each thread in the sequence of increasing SCN. This thread merge redo algorithm is only used for media restoration (see section 6th ).

Crash recovery if you use the thread merge redo algorithm to restore a backup, even if the data file checkpoint is consistent with the database checkpoint, it will still fail. The cause is that in all threads, the redo log between the database check point and the highest check point will be lost during crash recovery. In contrast, media recovery applies the redo program from the data file checkpoint. In addition, the application will still fail to redo the program from the data file check point even if the crash is recovered. Because it will only find online redo logs. All threads may have archived their redo logs and reused online logs.

If the command startup recover is used, the crash recovery will fail because the data file needs to be restored by media. At this time, the recover database will be automatically called to restore the media before the DATABASE is opened.

5.4 checkpoint

Instance recovery does not attempt to apply the redo log before the data file checkpoint (the checkpoint SCN in the data file header does not determine whether instance recovery is required ).

The instance resumes reading the redo log from the data file checkpoint to the end of the thread, and finds the maximum SCN allocated by the thread. Used to close threads and push thread checkpoints. After the instance is restored, the data file check points and checkpoint counters are promoted.

5.5 The crash recovery is complete

When the crash recovery is complete, the online fuzzy bit of all data files, the hot backup fuzzy bit, and the medium recovery fuzzy bit will be cleared, and then a special redo record will be written into the redo log, mark the end of the crash recovery. This record is used to notify media when the online fuzzy bits and Hot Backup fuzzy bits of data files can be cleared during recovery.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.