"Translated from MoS article" when NFS server goes down, the Oracle database freezes and there are no errors in the alert file

Last Update:2015-02-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

When the NFS server goes down, the Oracle database freezes and there are no errors in the alert file

Translated from MoS article: when NFS server was down, Oracle server freezes with No Errors in Alert Log File (document ID 1316251.1)

Suitable for:
Oracle server-enterprise edition-version:10.2.0.4 and later [release:10.2 and later]
IBM AIX on POWER Systems (64-bit)
Symptoms:
The Oracle instance on AIX has an NFS mount point, which is based on the purpose of backup. The option to mount the mount point is as follows:

Bg,hard,intr,rsize=32768,wsize=32768,sec=sys,noac,rw

When NFS Server is down, ooracle RDMBS freezes and there are no errors in the alert log. When the NFS sserver is restored, database also works, without any problems.

Change:
The environment has not changed, only the NAS connectivity (to NFS Server) has been lost, so the remote directory is inaccessible.
Reason:

From the uploaded sqlplus and df tusss traces, we can see that the STATX command hangs at/backup.

462940:statx ("./... /.. /.. /.. /backup ", 0x0fffffffffff5980, 176, 021) (Sleeping ...) 561338:kread (14, "ÿÿjø\0\0\0\0\0\0\010") err#82 erestart561338:received signal #2, SIGINT [CAUGHT]561338:SIGP Rocmask (0, 0x0fffffffffff3620, 0x0000000000000000) = 0561338:sigprocmask (1, 0x0fffffffffff3620, 0x0000000000000000) = 0561338:ksetcontext_sigreturn (0x0fffffffffff37a0, 0x0000000000000000, 0x00000001100f04f0,0x800000000000d032, 0x3000000000000000, 0x0000000000000360, 0x0000000000000000, 0x0000000000000000) 561338:kread (14, "ÿÿjø\0\0\0\0\0\0\ 010 "..," err#82 erestart561338:received signal #2, SIGINT [Caught]561338:sigprocmask (0, 0x0fffffffffff3620, 0x000000 0000000000) = 0561338:sigprocmask (1, 0x0fffffffffff3620, 0x0000000000000000) = 0561338:ksetcontext_sigreturn ( 0X0FFFFFFFFFFF37A0, 0x0000000000000000, 0x00000001100f04f0,0x800000000000d032, 0x3000000000000000, 0x0000000000000320, 0x0000000000000000, 0x0000000000000000) 561338:kread (14, "ÿÿjø\0\0\0\0\0\0\010"...) err#82Erestart561338:received signal #2, SIGINT [Caught]561338:sigprocmask (0, 0x0fffffffffff3620, 0x0000000000000000) = 0561338:sigprocmask (1, 0x0fffffffffff3620, 0x0000000000000000) = 0561338:ksetcontext_sigreturn (0x0fffffffffff37a0, 0x0000000000000000, 0x00000001100f04f0,0x800000000000d032, 0x3000000000000000, 0x0000000000000310, 0x0000000000000000, 0x0000000000000000) 561338:kread (14, "ÿÿjø\0\0\0\0\0\0\010"...) err#82 Erestart561338:receive D signal #2, SIGINT [Caught]561338:sigprocmask (0, 0x0fffffffffff3620, 0x0000000000000000) = 0561338:sigprocmask (1, 0x0F FFFFFFFFFF3620, 0x0000000000000000) = 0561338:ksetcontext_sigreturn (0x0fffffffffff37a0, 0x0000000000000000, 0x00000001100f04f0,0x800000000000d032, 0x3000000000000000, 0x0000000000000310, 0x0000000000000000, 0x0000000000000000) 561338:kread (14, "ÿÿjø\0\0\0\0\0\0\010") err#82 erestart561338:received signal #2, SIGINT [ Caught]561338:sigprocmask (0, 0x0fffffffffff3620, 0x0000000000000000) = 0561338:sigprocmask (1,0x0fffffffffff3620, 0x0000000000000000) = 0561338:ksetcontext_sigreturn (0x0fffffffffff37a0, 0x0000000000000000, 0x00000001100f04f0,0x800000000000d032, 0x3000000000000000, 0x0000000000000320, 0x0000000000000000, 0x0000000000000000) 561338:kread (14, "ÿÿjø\0\0\0\0\0\0\010" ...) (sleeping ...) 462940:statx ("./... /.. /.. /.. /backup ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /usr ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /lib ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /audit ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /dev ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /etc ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /U ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /LPP ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /mnt ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /proc ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /sbin ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /bin ", 0x0fffffffffff5980, 176, 021) = 0462940:statx ("./... /.. /.. /.. /oracle ", 0x0fffffffffff5980, 176, 021) = 0

The problem is in the following place:

Statx ("./... /.. /.. /.. /backup ", 0x0fffffffffff5980, 176, 021) (Sleeping ...)

The Oracle Program (code) calls a UNIX system call, ' GETCWD ' to get the current working directory. After that, all control is returned to the operating system.
From what we see, the function ' getcwd ' calls ' GETWD ', and ' GETWD ' will call ' Statx ' in turn. Once ' Statx ' is executed, it begins processing directory entries by executing ' STATX ' in the following order

././.../.. /.../.. /.. /.. (This goes on until, the root directory is reached)

Once the root directory (/) is reached, ' Lstat ' will call ' Statx ' for each entry in the directory. Oracle doesn't control this process at all, so we can't do anything to prevent this from happening (it's all OS-level stuff)

Workaround:
From a similar issue, IBM has suggested the following action plan to avoid this problem. The answer from IBM is:

Here's a solution to avoid the problem described by Oracle:do not having the NFS mounts directly under/, but put them one Level lower. Then, we can use the symbolic links to them. NFS mount point in node  /nfs/backup (/nfs is a directory we ' ll create, it can has any name) and create a softlink/ba Ckup,/nfs/backup. $ ln-s/nfs/backup/backupthis would avoid the statx problem without have to make changes in the Setup (because/backup is still there). Additionally you can ask IBM about APAR # IZ85027, IZ85029, IZ85032, IZ86102, IZ87374, IZ90533. Check with the IBM which one applies to your configuration.

"Translated from MoS article" when NFS server goes down, the Oracle database freezes and there are no errors in the alert file

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

"Translated from MoS article" when NFS server goes down, the Oracle database freezes and there are no errors in the alert file

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

"Translated from MoS article" when NFS server goes down, the Oracle database freezes and there are no errors in the alert file

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support