When the NFS server goes down, the Oracle database freezes and there are no errors in the alert file
Translated from MoS article: when NFS server was down, Oracle server freezes with No Errors in Alert Log File (document ID 1316251.1)
Suitable for:
Oracle server-enterprise edition-version:10.2.0.4 and later [release:10.2 and later]
IBM AIX on POWER Systems (64-bit)
Symptoms:
The Oracle instance on AIX has an NFS mount point, which is based on the purpose of backup. The option to mount the mount point is as follows:
Bg,hard,intr,rsize=32768,wsize=32768,sec=sys,noac,rw
When NFS Server is down, ooracle RDMBS freezes and there are no errors in the alert log. When the NFS sserver is restored, database also works, without any problems.
Change:
The environment has not changed, only the NAS connectivity (to NFS Server) has been lost, so the remote directory is inaccessible.
Reason:
From the uploaded sqlplus and df tusss traces, we can see that the STATX command hangs at/backup.
462940:statx ("./... /.. /.. /.. /backup ", 0x0fffffffffff5980, 176, 021) (Sleeping ...) 561338:kread (14, "ÿÿjø\0\0\0\0\0\0\010") err#82 erestart561338:received signal #2, SIGINT [CAUGHT]561338:SIGP Rocmask (0, 0x0fffffffffff3620, 0x0000000000000000) = 0561338:sigprocmask (1, 0x0fffffffffff3620, 0x0000000000000000) = 0561338:ksetcontext_sigreturn (0x0fffffffffff37a0, 0x0000000000000000, 0x00000001100f04f0,0x800000000000d032, 0x3000000000000000, 0x0000000000000360, 0x0000000000000000, 0x0000000000000000) 561338:kread (14, "ÿÿjø\0\0\0\0\0\0\ 010 "..," err#82 erestart561338:received signal #2, SIGINT [Caught]561338:sigprocmask (0, 0x0fffffffffff3620, 0x000000 0000000000) = 0561338:sigprocmask (1, 0x0fffffffffff3620, 0x0000000000000000) = 0561338:ksetcontext_sigreturn ( 0X0FFFFFFFFFFF37A0, 0x0000000000000000, 0x00000001100f04f0,0x800000000000d032, 0x3000000000000000, 0x0000000000000320, 0x0000000000000000, 0x0000000000000000) 561338:kread (14, "ÿÿjø\0\0\0\0\0\0\010"...) err#82Erestart561338:received signal #2, SIGINT [Caught]561338:sigprocmask (0, 0x0fffffffffff3620, 0x0000000000000000) = 0561338:sigprocmask (1, 0x0fffffffffff3620, 0x0000000000000000) = 0561338:ksetcontext_sigreturn (0x0fffffffffff37a0, 0x0000000000000000, 0x00000001100f04f0,0x800000000000d032, 0x3000000000000000, 0x0000000000000310, 0x0000000000000000, 0x0000000000000000) 561338:kread (14, "ÿÿjø\0\0\0\0\0\0\010"...) err#82 Erestart561338:receive D signal #2, SIGINT [Caught]561338:sigprocmask (0, 0x0fffffffffff3620, 0x0000000000000000) = 0561338:sigprocmask (1, 0x0F FFFFFFFFFF3620, 0x0000000000000000) = 0561338:ksetcontext_sigreturn (0x0fffffffffff37a0, 0x0000000000000000, 0x00000001100f04f0,0x800000000000d032, 0x3000000000000000, 0x0000000000000310, 0x0000000000000000, 0x0000000000000000) 561338:kread (14, "ÿÿjø\0\0\0\0\0\0\010") err#82 erestart561338:received signal #2, SIGINT [ Caught]561338:sigprocmask (0, 0x0fffffffffff3620, 0x0000000000000000) = 0561338:sigprocmask (1,0x0fffffffffff3620, 0x0000000000000000) = 0561338:ksetcontext_sigreturn (0x0fffffffffff37a0, 0x0000000000000000, 0x00000001100f04f0,0x800000000000d032, 0x3000000000000000, 0x0000000000000320, 0x0000000000000000, 0x0000000000000000) 561338:kread (14, "ÿÿjø\0\0\0\0\0\0\010" ...) (sleeping ...) 462940:statx ("./... /.. /.. /.. /backup ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /usr ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /lib ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /audit ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /dev ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /etc ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /U ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /LPP ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /mnt ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /proc ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /sbin ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /bin ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /oracle ", 0x0fffffffffff5980, 176, 021) = 0
The problem is in the following place:
Statx ("./... /.. /.. /.. /backup ", 0x0fffffffffff5980, 176, 021) (Sleeping ...)
The Oracle Program (code) calls a UNIX system call, ' GETCWD ' to get the current working directory. After that, all control is returned to the operating system.
From what we see, the function ' getcwd ' calls ' GETWD ', and ' GETWD ' will call ' Statx ' in turn. Once ' Statx ' is executed, it begins processing directory entries by executing ' STATX ' in the following order
././.../.. /.../.. /.. /.. (This goes on until, the root directory is reached)
Once the root directory (/) is reached, ' Lstat ' will call ' Statx ' for each entry in the directory. Oracle doesn't control this process at all, so we can't do anything to prevent this from happening (it's all OS-level stuff)
Workaround:
From a similar issue, IBM has suggested the following action plan to avoid this problem. The answer from IBM is:
Here's a solution to avoid the problem described by Oracle:do not having the NFS mounts directly under/, but put them one Level lower. Then, we can use the symbolic links to them. NFS mount point in node /nfs/backup (/nfs is a directory we ' ll create, it can has any name) and create a softlink/ba Ckup,/nfs/backup. $ ln-s/nfs/backup/backupthis would avoid the statx problem without have to make changes in the Setup (because/backup is still there). Additionally you can ask IBM about APAR # IZ85027, IZ85029, IZ85032, IZ86102, IZ87374, IZ90533. Check with the IBM which one applies to your configuration.
"Translated from MoS article" when NFS server goes down, the Oracle database freezes and there are no errors in the alert file