"Translated from MoS article" when NFS server goes down, the Oracle database freezes and there are no errors in the alert file

Source: Internet
Author: User


When the NFS server goes down, the Oracle database freezes and there are no errors in the alert file

Translated from MoS article: when NFS server was down, Oracle server freezes with No Errors in Alert Log File (document ID 1316251.1)

Suitable for:
Oracle server-enterprise edition-version:10.2.0.4 and later [release:10.2 and later]
IBM AIX on POWER Systems (64-bit)
Symptoms:
The Oracle instance on AIX has an NFS mount point, which is based on the purpose of backup. The option to mount the mount point is as follows:

Bg,hard,intr,rsize=32768,wsize=32768,sec=sys,noac,rw

When NFS Server is down, ooracle RDMBS freezes and there are no errors in the alert log. When the NFS sserver is restored, database also works, without any problems.

Change:
The environment has not changed, only the NAS connectivity (to NFS Server) has been lost, so the remote directory is inaccessible.
Reason:

From the uploaded sqlplus and df tusss traces, we can see that the STATX command hangs at/backup.

462940:statx ("./... /.. /.. /.. /backup ", 0x0fffffffffff5980, 176, 021) (Sleeping ...) 561338:kread (14, "ÿÿjø\0\0\0\0\0\0\010") err#82 erestart561338:received signal #2, SIGINT [CAUGHT]561338:SIGP Rocmask (0, 0x0fffffffffff3620, 0x0000000000000000) = 0561338:sigprocmask (1, 0x0fffffffffff3620, 0x0000000000000000) = 0561338:ksetcontext_sigreturn (0x0fffffffffff37a0, 0x0000000000000000, 0x00000001100f04f0,0x800000000000d032, 0x3000000000000000, 0x0000000000000360, 0x0000000000000000, 0x0000000000000000) 561338:kread (14, "ÿÿjø\0\0\0\0\0\0\ 010 "..," err#82 erestart561338:received signal #2, SIGINT [Caught]561338:sigprocmask (0, 0x0fffffffffff3620, 0x000000 0000000000) = 0561338:sigprocmask (1, 0x0fffffffffff3620, 0x0000000000000000) = 0561338:ksetcontext_sigreturn ( 0X0FFFFFFFFFFF37A0, 0x0000000000000000, 0x00000001100f04f0,0x800000000000d032, 0x3000000000000000, 0x0000000000000320, 0x0000000000000000, 0x0000000000000000) 561338:kread (14, "ÿÿjø\0\0\0\0\0\0\010"...) err#82Erestart561338:received signal #2, SIGINT [Caught]561338:sigprocmask (0, 0x0fffffffffff3620, 0x0000000000000000) = 0561338:sigprocmask (1, 0x0fffffffffff3620, 0x0000000000000000) = 0561338:ksetcontext_sigreturn (0x0fffffffffff37a0, 0x0000000000000000, 0x00000001100f04f0,0x800000000000d032, 0x3000000000000000, 0x0000000000000310, 0x0000000000000000, 0x0000000000000000) 561338:kread (14, "ÿÿjø\0\0\0\0\0\0\010"...) err#82 Erestart561338:receive D signal #2, SIGINT [Caught]561338:sigprocmask (0, 0x0fffffffffff3620, 0x0000000000000000) = 0561338:sigprocmask (1, 0x0F FFFFFFFFFF3620, 0x0000000000000000) = 0561338:ksetcontext_sigreturn (0x0fffffffffff37a0, 0x0000000000000000, 0x00000001100f04f0,0x800000000000d032, 0x3000000000000000, 0x0000000000000310, 0x0000000000000000, 0x0000000000000000) 561338:kread (14, "ÿÿjø\0\0\0\0\0\0\010") err#82 erestart561338:received signal #2, SIGINT [ Caught]561338:sigprocmask (0, 0x0fffffffffff3620, 0x0000000000000000) = 0561338:sigprocmask (1,0x0fffffffffff3620, 0x0000000000000000) = 0561338:ksetcontext_sigreturn (0x0fffffffffff37a0, 0x0000000000000000, 0x00000001100f04f0,0x800000000000d032, 0x3000000000000000, 0x0000000000000320, 0x0000000000000000, 0x0000000000000000) 561338:kread (14, "ÿÿjø\0\0\0\0\0\0\010" ...) (sleeping ...) 462940:statx ("./... /.. /.. /.. /backup ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /usr ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /lib ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /audit ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /dev ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /etc ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /U ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /LPP ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /mnt ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /proc ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /sbin ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /bin ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /oracle ", 0x0fffffffffff5980, 176, 021) = 0

The problem is in the following place:

Statx ("./... /.. /.. /.. /backup ", 0x0fffffffffff5980, 176, 021) (Sleeping ...)

The Oracle Program (code) calls a UNIX system call, ' GETCWD ' to get the current working directory. After that, all control is returned to the operating system.
From what we see, the function ' getcwd ' calls ' GETWD ', and ' GETWD ' will call ' Statx ' in turn. Once ' Statx ' is executed, it begins processing directory entries by executing ' STATX ' in the following order

././.../.. /.../.. /.. /.. (This goes on until, the root directory is reached)

Once the root directory (/) is reached, ' Lstat ' will call ' Statx ' for each entry in the directory. Oracle doesn't control this process at all, so we can't do anything to prevent this from happening (it's all OS-level stuff)

Workaround:
From a similar issue, IBM has suggested the following action plan to avoid this problem. The answer from IBM is:

Here's a solution to avoid the problem described by Oracle:do not having the NFS mounts directly under/, but put them one Level lower. Then, we can use the symbolic links to them. NFS mount point in node  /nfs/backup (/nfs is a directory we ' ll create, it can has any name) and create a softlink/ba Ckup,/nfs/backup. $ ln-s/nfs/backup/backupthis would avoid the statx problem without have to make changes in the Setup (because/backup is still there). Additionally you can ask IBM about APAR # IZ85027, IZ85029, IZ85032, IZ86102, IZ87374, IZ90533. Check with the IBM which one applies to your configuration.

"Translated from MoS article" when NFS server goes down, the Oracle database freezes and there are no errors in the alert file

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.