Terminating the instance due to error 481 causes the ASM to fail to start. terminatingasm
1. symptom description
When a RAC database is accidentally shut DOWN, the first node starts normally, but the second node cannot start ASM and CRS resources.
2. analyze the cause
Because the ASM disk cannot be started, the following information is found in the ASM log:
MMNL started with pid = 21, OS id = 14028
Lmon registered with NM-instance number 2 (internal mem no 1)
Tue Nov 18 14:48:50 2014
PMON (ospid: 13986): terminating the instance due to error 481
Tue Nov 18 14:48:50 2014
System state dumprequested by (instance = 2, osid = 13986 (PMON), summary = [abnormal instancetermination].
System State dumped totrace file/oracle/product/grid/diag/asm/+ ASM2/trace/+ ASM2_diag_13996.trc
Tue Nov 18 14:48:50 2014
ORA-1092: opitsk aborting process
Dumping diagnostic data indirectory = [cdmp_20141118144850], requested by (instance = 2, osid = 13986 (PMON), summary = [abnormal instance termination].
Tue Nov 18 14:48:50 2014
ORA-1092: opitsk abortingprocess
Instance terminated byPMON, pid = 13986
On the Metalink website, we found a document, ASM onNon First Node (Second or Other Node) Fails to Come up With: PMON (ospid: nnnn ): terminating the instance due to error 481 [ID 1383737.1]. According to this document, the title of the "PMON (ospid: 9946): terminating the instance due to error 481" document in the ASM alert Log error message is consistent. Check related logs and configuration information based on the document content, and find that the asm trace is also consistent with the document. As follows:
* ** 14:48:17. 092
Reconfiguration completes [incarn = 42]
Kjzdattdlm: Can not attach to DLM (LMONup = [TRUE], DB mounted = [FALSE]).
Cluster logs are analyzed to find the following information:
14:44:45. 767
[/Oracle/product/11.2.0/grid/bin/orarootagent. bin (12690)] CRS-5018 :(: CLSN00037 :) Removed unused HAIP route: 169.254.95.0/255.255.255.0/0.0.0.0/usb0
ORACLE considers that USB 0 (host management interface card, which is enabled by default) occupies HAIP (New Features of ORACLE 11g, internal high-availability private IP address, and uses HAIP to replace the private IP addresses of the two nodes, for internal communication ).
The following information is found by analyzing host logs:
Nov 18 14:02:11 XXXdb2 dhclient: DHCPREQUEST on usb0 to 255.255.255.255 port 67
Nov 18 14:02:12 XXXdb2 dhclient: DHCPACK from 169.254.95.118
Nov 18 14:02:12 XXXdb2 dhclient: boundto 169.254.95.120 -- renewal in 234 seconds.
The host network adapter USB 0 dynamically obtains the IP address of the 169.254.XX.XX network segment.
The ibm pc server uses USB 0 as the network management feature. When you do not connect to the USB 0 Nic, you will not stop applying for IP addresses from DHCP. If no DHCP is found, a 169.254.xxx.xxx IP address will be allocated by default, which will conflict with the HAIP address of ORACLE, resulting in the loss of route information.
Through the comparison of various log information and the information in the document, it is found that the fault phenomenon is consistent with the fault phenomenon in the document.
3. Solution
Based on the content provided in document ID 1383737.1, use ifdown USB 0 to disable the USB 0 Nic of the two nodes, and then dynamically add route information to the node with the missing route information.
# Route add-net 169.254.0.0 netmask 255.255.0.0 dev eth2
After adding a route, run the command
# Su-grid
$ GRID_HOME/bin/crsctl start res ora. crsd-init
After crsd resources are started normally, all asm and crsd resources are started normally.
4. Fault Summary
The x3850 x5 PC Server of IBM has the dhcp function enabled by USB, which causes the usb Nic to occupy HAIP defects. The RAC database environment running on such machines in the production environment, disable the automatic dhcp retrieval function of USB 0 and configure a static IP address for USB 0.
5. References
ASM on Non First Node (Second or Other Node) Fails to Come up With: PMON (ospid: nnnn): terminating the instance due to error 481 [ID 1383737.1]
Author: LI Junjie (Network Name: Step-by-Step), engaged in "system architecture, operating system, storage device, database, middleware, application" six levels of systematic performance optimization work
Join the system performance optimization professional group to discuss performance optimization technologies. GROUP: 258187244