Oracle Database error ora-00600 [3712] recovery case

Last Update:2017-01-13 Source: Internet

Author: User

Tags error code oracle database metalink

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

During routine maintenance of Oracle databases, we may often encounter errors that we have never seen before, or even inexplicably. In many cases, you cannot even search for related content through metalink, baidu, or google. This is not the case. Yesterday, my colleagues in the southern district of the company asked me to help restore a customer database. It is said that it is an archive database. If there is no backup, the database cannot be opened after the instance is restarted.
This is also the first time I have heard of this kind of thing. I have read the database of Oracle 11.2.0.3, and there are at least such things that are really incredible. First, let's look at what an error is.

See this error. I felt like I was familiar with each other, but I couldn't tell you what the mistake was. But from the error code, I can roughly determine what is related to the content. Here I expand, metalink has a detailed document description for Oracle ora-00600 errors, which classifies error numbers following error 600. You must understand this document. So now, even if I have never seen a ora-00600 error, I can still roughly determine at first glance what the problem is. Here we will list:
Ora-600 Base Functionality Description
2000 server/rcv Cache Op
2100 server/rcv Control File mgmt
2200 server/rcv Misc (SCN etc .)
2400 server/rcv Buffer Instance Hash Table
2600 server/rcv Redo file component
2800 server/rcv Db file
3000 server/rcv Redo Application
3200 server/cache Buffer manager
3400 server/rcv Archival & media recovery component
3600 server/rcv recovery component
3700 server/rcv Thread component
3800 server/rcv Compatibility segment

From the description, we can roughly judge that this error must be related to redo. Let's look back at the alert log information. We can see a key line of information: crash recovery due to error 600.
For the open process of the Oracle database, we know that we need to go through the nomount-mount-open process. If it is caused by an abnormal shutdown, such as forcing abort, when we open the database, oracle requires instance recovery; in fact, I can query v $ Log and find that the next_change # of current redo logfile is infinite.
First, I tried to manually recover the database. There was no problem, and then alter database open was still reported as error 3712. Here I found a problem where all scn changes and is updated to a consistent state. But why is it still an error?
We know that in fact, when Oracle is open, it is not only necessary to restore the instance. After the instance is restored, it is necessary to open the database smoothly. If we want to see whether such a scenario exists:
Assume that the currently recovered database scn has reached 100000, but after the instance is restored, the next scn to be updated is smaller than the current one (for example, 99999). What will happen? Obviously, this will report an error.
Many people may not understand or even understand why I think so. There are two main factors:
1. Based on the basic understanding of the database principles, I have a deep understanding of the oracle database open process.
2, carefully observe the above ORA-00600 errors.
OK. For this error, refer to [3371], [612688841], [3371], [612688840]. When we see this string of numbers, we should think or imagine what the meaning of this number is?
Based on our database understanding and experience, it is usually used to indicate the sequence, dba address, file number, scn, and so on.
I think anyone with a little common sense can see the SCN. Some people may say why is the scn represented here?
In this case, you do not know the basic structure of Oracle scn. The scn in Oracle consists of two parts: high and low. The general principle is as follows:
The minimum value of scn is 0 × 0000.00000000, and the maximum value is 0xffff. Ffffff. The high position is scn wrap, that is, 0x0000, and the low position is scn base, that is, the next eight digits. The correct SCN should be = scn warp * power (2, 32) + scn base
I think we can roughly judge that the 3371 here should be the scn wrap value, and the 612688841 next should be the scn base. Convert the scn and compare it with the latest scn in the file header. Here we can verify our judgment.
Here, we can find a problem. Isn't scn correct? Why not? Because there are two scn, which are:
3371 * power (612688841) + 3371 and 612688840 * power () +
Obviously, the two values are different in size. I think Oracle must make a judgment and find that the resulting scn is smaller than our current scn. The smaller scn is the problematic scn. For example, this scn comes from a control file.
Now, I know how to solve this problem perfectly. The answer is to recreate the control file.
The following are the basic steps for restoration. The procedure for recreating the control file is no longer described.

After a script for recreating the control file is generated, rebuild the control file and remember to create it in the noresetlogs mode (the rac environment needs to be modified).

Cluster_database = false); after the creation, recover the database directly, and then open the database smoothly!

Supplement:

1. Later I found that this is most likely a bug in Oracle 11.2.0.3:

Bug 16432211: ORA-00600 [KCRFNL_3], LGWR... Terminating the instance, ORA-00600 [3712]

The alert log and trace I found later are basically the same.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More