The RAC environment generates a large number of ONS processes, causing user process resources to run out and user Switching prompts resource temporarily unavailable

Source: Internet
Author: User

Basic elements (time, user, problem)

The user only implemented the linux5.8+11.2.0.4 RAC environment, after a period of time, when switching the grid user, prompted resource temporarily unavailable, as follows:

[Email protected] bin]# Su-grid

Su:cannot Set Userid:resource temporarily unavailable

But we switch other users, such as Oracle users, but can switch normally, and CRS cluster use normal, client connection and user's use of the temporary no impact, the user in the first occurrence of this situation, the use of restarting the server, the way to temporarily solve the problem, but not long, Again, so users need to completely solve this problem, to avoid other security risks, affecting the normal business applications.

Problem Analysis Step one: Detect operating system Resource throttling configuration

Generally this situation, first of all should consider our implementation process in the operating system on the grid user resource constraints of the parameters set possible problems, in the implementation of RAC, the user's resource limit of 2 places/etc/security/limits.conf and/etc/ Profile, you should first detect the contents of these 2 configuration files, as follows:

[Email protected] ~]# cat/etc/security/limits.conf

Grid Soft Nproc 16384

Grid hard Nproc 65536

Grid Soft Nofile 2047

Grid hard Nofile 65536

Oracle Soft Nproc 16384

Oracle Hard nproc65536

Oracle Soft Nofile 2047

Oracle Hard nofile65536

[Email protected] ~]# Cat/etc/profile

if [$USER = "Oracle"] | | [$USER = "Grid"]; Then

if[$SHELL = "/bin/ksh"]; Then

Ulimit-p 16384

Ulimit-n 65536

Else

Ulimit-u 16384-n 65536

Fi

umask022

Fi

The Nproc here is the control of the maximum number of processes the user can use, where soft is a soft limit, and the user can exceed the value of this setting, but must not exceed the hard value. The general soft is smaller than the hard one, tough is the rigid limit, the format of the/etc/security/limits.conf is as follows, here takes fszize this parameter as an example:


Our grid soft Nproc 16384 and grid hard Nproc 65536 indicate that a grid user can enable up to 65,536 processes, with a warning of 16384, and then we should look at the number of processes under the user, as follows

[Email protected] ~]# Ps–ugrid |wc–l

156

[Email protected] ~]# ps–aux|wc–l

659

I looked here under the process is not much, no more than the alarm value, should not prompt resource temporarily unavailable error Ah, I suspect here is I use the command parameters may have problems, through the Baidu PS Command Introduction, changed a parameter to execute as follows:

[Email protected] ~]# ps–el|wc–l

17730

This time we can see the obvious process anomaly, there are actually 16,530 processes in a node, and here, the-E is the process of showing all users, our previous-aux obviously filtered out some processes, this is because

-a displays all programs executed under all terminals

-E Show All Programs

The former only shows all the execution program on the terminal, and does not show all the programs, the latter is the complete display of all the processes in the current environment, then we need to carefully troubleshoot these unconventional processes, by listing, found a large number of ONS process, as follows:


With a summary of the commands, there are a total of 16,530 ons processes,

[Email protected] ~]# ps–el|grepons |wc–l

16530

This has finally found the root cause of the problem, and then we need to deal with that problem.

Step two: ONS process analysis

The official ONS (Oracle Notification Services) explains the following a publish Andsubscribe service for communicating information on all FAN events its Mainly responsible for the communication between RAC nodes, is a very important service process, why a large number of the ONS process? ONS has thousand processes/threads and still increasing (document ID 1547703.1) give reasons

applies To:

Oracledatabase-enterprise edition-version 11.2.0.1 and later
Information in ThisDocument applies to any platform.

Symptoms

The number of ONS processes/threads continuously increases.

Oracle 9470 17663 7447 0 7599 07:11? 00:00:00/orahome/app/grid/opmn/bin/ons-d
Oracle 9470 17663 8920 0 7599 07:12? 00:00:00/orahome/app/grid/opmn/bin/ons-d
Oracle 9470 17663 10425 0 7599 07:13? 00:00:00/orahome/app/grid/opmn/bin/ons-d
..

The output Ofcommand-"onsctl Debug"

IPADDRESS PORT Time SEQUENCE FLAGS
--------------------------------------- ------------- -------- --------
127.0.0.1 6200 511C7CCB 00000001 00000008
Listener:
TYPE bindaddress PORT SOCKET
-------- -------------------------------------------- ------
Local 127.0.0.1 6100 5
Remote any 6200 6
Remote any 6200-
Connection topology: (1)
IP PORT VERS Time
--------------------------------------- ---------- --------
127.0.0.1 6200 4 511c7cdd=
* * 127.0.0.1 6200
* * 127.0.0.1 6200

Server Connections:
ID connectionaddress PORT FLAGS sendq REF Wsaq
-------- -------------------------------------------- ------ ----- --- ----
6 127.0.0.1 6200090026 00000 001
Client Connections:
ID connectionaddress PORT FLAGS sendq REF SUB W
-------- -------------------------------------------- ------ ----- --- --- -
1 Internal 0 01008a 00000 001 002
2 127.0.0.1 610001001a 00000 001 001
5 127.0.0.1 610001001a 00000 001 000
Request 127.0.0.1 6100 03201a 00000 001 000

cause

Misconfigured/etc/hostsfor Loopback Interface

-------------------------------------------------------------------
127.0.0.1 EMSDB01 Localhost.localdomainlocalhost
-------------------------------------------------------------------

Solution

Change Loopbackinterface to the following:

-------------------------------------------------------------------
127.0.0.1 Localhost.localdomainlocalhost
-------------------------------------------------------------------

Resolve Process Step One: view/etc/hosts file

We look at the/etc/hosts file, found that indeed in 127.0.0.1 this line, retains the host name, it seems that our implementation of the implementation of the process is not meticulous results, remove the cursor that column hostname, as follows

[Email protected] ~]# cat/etc/hosts

127.0.0.1 rac01 localhost.localdomainlocalhost192.168.4.23 rac01192.168.4.24 RAC02192.168.4.27 RAC01-VIP192.168.4.28 RAC02-VIP192.168.4.30 Scan-rac

After the 2 nodes have been adjusted, the next node is restarted sequentially, followed by the Onsctl debug command execution results are as follows

ADDRESS PORT Time SEQUENCE FLAGS
--------------------------------------- ------------- -------- --------
127.0.0.1 6200 511C7CCB 00000001 00000008
Listener:
TYPE bindaddress PORT SOCKET
-------- -------------------------------------------- ------
Local 127.0.0.1 6100 5
Remote any 6200 6
Remote any 6200-
Connection topology: (1)
IP PORT VERS Time
--------------------------------------- ---------- --------
127.0.0.1 6200 4 511c7cdd=
192.168.4.23 6200
192.168.4.24 6200

Server Connections:
ID connectionaddress PORT FLAGS sendq REF Wsaq
-------- -------------------------------------------- ------ ----- --- ----
6 127.0.0.1 6200090026 00000 001
Client Connections:
ID connectionaddress PORT FLAGS sendq REF SUB W
-------- -------------------------------------------- ------ ----- --- --- -
1 Internal 0 01008a 00000 001 002
2 127.0.0.1 610001001a 00000 001 001
5 127.0.0.1 610001001a 00000 001 000
Request 127.0.0.1 6100 03201a 00000 001 000

We see that the IP of the node has been displayed correctly compared to the previous one, and then we query the ONS process, which has been reduced to about 2, and the problem is completely solved.

[Email protected] ~]# ps–el|grepons |wc–l

2

Key points of knowledge

1.PS View process commands, note there is-and no--the difference, for example we want to see all processes, should be PS aux and if you use Ps–aux can not display all processes, because:

Parameter description:

-A shows all the processes performed under all terminals except the stage job leader.
A shows all processes under the current terminal, including the processes of other users.
-e
displays all processes .
e Displays the environment variables used by each process when the process is listed.

2.11GR2 RAC Implementation, be sure to remember the Hosts file 127.0.0.1 This column of the hostname removed, otherwise it will lead to a lot of ons process.

The RAC environment generates a large number of ONS processes, causing user process resources to run out and user Switching prompts resource temporarily unavailable

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.