Terminating the instance due to error 481 causes the ASM to fail to start. terminatingasm

Source: Internet
Author: User

Terminating the instance due to error 481 causes the ASM to fail to start. terminatingasm
1. symptom description

When a RAC database is accidentally shut DOWN, the first node starts normally, but the second node cannot start ASM and CRS resources.

2. analyze the cause

Because the ASM disk cannot be started, the following information is found in the ASM log:

MMNL started with pid = 21, OS id = 14028

Lmon registered with NM-instance number 2 (internal mem no 1)

Tue Nov 18 14:48:50 2014

PMON (ospid: 13986): terminating the instance due to error 481

Tue Nov 18 14:48:50 2014

System state dumprequested by (instance = 2, osid = 13986 (PMON), summary = [abnormal instancetermination].

System State dumped totrace file/oracle/product/grid/diag/asm/+ ASM2/trace/+ ASM2_diag_13996.trc

Tue Nov 18 14:48:50 2014

ORA-1092: opitsk aborting process

Dumping diagnostic data indirectory = [cdmp_20141118144850], requested by (instance = 2, osid = 13986 (PMON), summary = [abnormal instance termination].

Tue Nov 18 14:48:50 2014

ORA-1092: opitsk abortingprocess

Instance terminated byPMON, pid = 13986

On the Metalink website, we found a document, ASM onNon First Node (Second or Other Node) Fails to Come up With: PMON (ospid: nnnn ): terminating the instance due to error 481 [ID 1383737.1]. According to this document, the title of the "PMON (ospid: 9946): terminating the instance due to error 481" document in the ASM alert Log error message is consistent. Check related logs and configuration information based on the document content, and find that the asm trace is also consistent with the document. As follows:

* ** 14:48:17. 092

Reconfiguration completes [incarn = 42]

Kjzdattdlm: Can not attach to DLM (LMONup = [TRUE], DB mounted = [FALSE]).

Cluster logs are analyzed to find the following information:

14:44:45. 767

[/Oracle/product/11.2.0/grid/bin/orarootagent. bin (12690)] CRS-5018 :(: CLSN00037 :) Removed unused HAIP route: 169.254.95.0/255.255.255.0/0.0.0.0/usb0

ORACLE considers that USB 0 (host management interface card, which is enabled by default) occupies HAIP (New Features of ORACLE 11g, internal high-availability private IP address, and uses HAIP to replace the private IP addresses of the two nodes, for internal communication ).

The following information is found by analyzing host logs:

Nov 18 14:02:11 XXXdb2 dhclient: DHCPREQUEST on usb0 to 255.255.255.255 port 67

Nov 18 14:02:12 XXXdb2 dhclient: DHCPACK from 169.254.95.118

Nov 18 14:02:12 XXXdb2 dhclient: boundto 169.254.95.120 -- renewal in 234 seconds.

The host network adapter USB 0 dynamically obtains the IP address of the 169.254.XX.XX network segment.

The ibm pc server uses USB 0 as the network management feature. When you do not connect to the USB 0 Nic, you will not stop applying for IP addresses from DHCP. If no DHCP is found, a 169.254.xxx.xxx IP address will be allocated by default, which will conflict with the HAIP address of ORACLE, resulting in the loss of route information.

Through the comparison of various log information and the information in the document, it is found that the fault phenomenon is consistent with the fault phenomenon in the document.

3. Solution

Based on the content provided in document ID 1383737.1, use ifdown USB 0 to disable the USB 0 Nic of the two nodes, and then dynamically add route information to the node with the missing route information.

# Route add-net 169.254.0.0 netmask 255.255.0.0 dev eth2

After adding a route, run the command

# Su-grid

$ GRID_HOME/bin/crsctl start res ora. crsd-init

After crsd resources are started normally, all asm and crsd resources are started normally.

4. Fault Summary

The x3850 x5 PC Server of IBM has the dhcp function enabled by USB, which causes the usb Nic to occupy HAIP defects. The RAC database environment running on such machines in the production environment, disable the automatic dhcp retrieval function of USB 0 and configure a static IP address for USB 0.

5. References

ASM on Non First Node (Second or Other Node) Fails to Come up With: PMON (ospid: nnnn): terminating the instance due to error 481 [ID 1383737.1]

 

Author: LI Junjie (Network Name: Step-by-Step), engaged in "system architecture, operating system, storage device, database, middleware, application" six levels of systematic performance optimization work

Join the system performance optimization professional group to discuss performance optimization technologies. GROUP: 258187244

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.