A large number of ons processes are generated in the RAC environment, resulting in Resource depletion of user processes. Resource temporarily unavailable and racons are displayed during User Switching.

Source: Internet
Author: User

A large number of ons processes are generated in the RAC environment, resulting in Resource depletion of user processes. Resource temporarily unavailable and racons are displayed during User Switching.
Basic elements (time, user, problem)

The user has implemented the RAC environment of LINUX5.8 + 11.2.0.4. After a period of use, the user will prompt the Resource temporarily unavailable when switching to the grid user, as shown below:

[Root @ rac01 bin] # su-grid

Su: cannot set userid: Resource temporarily unavailable

However, when switching other users, such as oracle users, we can switch normally, and the CRS cluster is used normally, and the client connection and user usage are not affected for the time being, when this occurs for the first time, the user temporarily solved the problem by restarting the server. However, it was not long before the problem appeared again. Therefore, the user needs to completely solve the problem, avoid other security risks and affect normal business applications.

Step 1: Check the operating system resource limit Configuration

In general, we should first consider the possible problems in setting the grid user resource limit parameter in the operating system during implementation. During the implementation of RAC, there are two restrictions on user resources:/etc/security/limits. conf and/etc/profile, first check the content of the two configuration files, as follows:

[Root @ rac01 ~] # Cat/etc/security/limits. conf

Grid soft nproc 16384

Grid hard nproc 65536

Grid soft nofile 2047

Grid hard nofile 65536

Oracle soft nproc 16384

Oracle hard nproc65536

Oracle soft nofile 2047

Oracle hard nofile65536

[Root @ rac01 ~] # Cat/etc/profile

If [$ USER = "oracle"] | [$ USER = "grid"]; then

If [$ SHELL = "/bin/ksh"]; then

Ulimit-p 16384

Ulimit-n 65536

Else

Ulimit-u 16384-n 65536

Fi

Umask022

Fi

Here, nproc is the control of the maximum number of processes that can be used by the user. soft is a soft limit. The user can exceed the set value, but it must not exceed the hard value. Generally, soft is smaller than hard, and hard is a hard limitation. The format in/etc/security/limits. conf is as follows. Here we take the fszize parameter as an example:


Here, grid soft nproc 16384 and grid hard nproc 65536 indicate that a grid user can enable up to 65536 processes, and a warning will be issued when it reaches 16384, next we should check the user's process count, as shown below

[Root @ rac01 ~] # Ps-Ugrid | wc-l

156

[Root @ rac01 ~] # Ps-aux | wc-l

659

I check that there are not many processes, and the alarm value is not exceeded. The error "Resource temporarily unavailable" should not be prompted. I suspect that the command parameters I used may be faulty, the Baidu PS command is used to introduce and execute the following parameters:

[Root @ rac01 ~] # Ps-eL | wc-l

17730

This time, we can see that there are 16530 processes in a node, and here-e shows the processes of all users, our previous-aux obviously filtered out some processes because

-A: displays the programs executed on all terminals.

-E: Display All Programs

The former shows only all the execution programs on the terminal, and does not show all the programs. The latter shows all the processes in the current environment, next, we need to carefully investigate these unconventional processes and find a large number of ONS processes through listing:


Run the command to summarize 16530 ons processes in total,

[Root @ rac01 ~] # Ps-eL | grepons | wc-l

16530

The root cause of the problem is finally found, and then we need to solve the problem.

 

Step 2: ONS Process Analysis

The official explanation of ONS (Oracle Notification Services) is as follows: A publish andsubscribe service for communicating information about all FAN events, which is mainly responsible for communications between RAC nodes and is A very important service process, why is there a large number of ONS processes? ONS has Thousand Processes/Threads and Still Increasing (Document ID 1547703.1) give the reason

Applies:

OracleDatabase-Enterprise Edition-Version 11.2.0.1 and later
Information in thisdocument applies to any platform.

SYMPTOMS

The number of ONS processes/threads continuously increases.

Oracle 9470 17663 7447 0 7599? 00:00:00/orahome/app/grid/opmn/bin/ons-d
Oracle 9470 17663 8920 0 7599? 00:00:00/orahome/app/grid/opmn/bin/ons-d
Oracle 9470 17663 10425 0 7599? 00:00:00/orahome/app/grid/opmn/bin/ons-d
..

The output ofcommand-"onsctl debug"

IPADDRESS PORT TIME SEQUENCE FLAGS
--------------------------------------------------------------------
127.0.0.1 6200 511c7ccb 00000001 00000008
Listener:
TYPE BINDADDRESS PORT SOCKET
----------------------------------------------------------
Local 127.0.0.1 6100 5
Remote any 6200 6
Remote any 6200-
Connection Topology: (1)
IP PORT VERS TIME
---------------------------------------------------------
127.0.0.1 6200 4 Export c7cdd =
** 127.0.0.1 6200
** 127.0.0.1 6200

Server connections:
ID CONNECTIONADDRESS PORT FLAGS SENDQ REF WSAQ
----------------------------------------------------------------------
6 127.0.0.1 6200090026 00000 001
Client connections:
ID CONNECTIONADDRESS PORT FLAGS SENDQ REF SUB W
----------------------------------------------------------------------
1 internal 0 01008a 00000 001 002
2 127.0.0.1 610001001a 00000 001 001
5 127.0.0.1 610001001a 00000 001 000
Request 127.0.0.1 6100 03201a 00000 001 000

CAUSE

Misconfigured/etc/hostsfor loopback interface

-------------------------------------------------------------------
127.0.0.1 emsdb01 localhost. localdomainlocalhost
-------------------------------------------------------------------

SOLUTION

Change loopbackinterface to the following:

-------------------------------------------------------------------
127.0.0.1 localhost. localdomainlocalhost
-------------------------------------------------------------------

Step 1: view the/etc/hosts file

Check the/etc/hosts file and find that the host name is retained in the line 127.0.0.1. It seems that the implementation process of our implementers is not meticulous. Remove the host name in the column of the cursor, as shown below:

[Root @ rac01 ~] # Cat/etc/hosts

127.0.0.1 Rac01Localhost. localdomainlocalhost192.168.4.23 rac01194254.24 rac02192.168.4.27 rac01-vip192.168.4.28 rac02-vip192.168.4.30 scan-rac

After the two nodes are adjusted, restart the node in sequence and run the onsctl debug command. The result is as follows:

ADDRESS PORT TIME SEQUENCE FLAGS
--------------------------------------------------------------------
127.0.0.1 6200 511c7ccb 00000001 00000008
Listener:
TYPE BINDADDRESS PORT SOCKET
----------------------------------------------------------
Local 127.0.0.1 6100 5
Remote any 6200 6
Remote any 6200-
Connection Topology: (1)
IP PORT VERS TIME
---------------------------------------------------------
127.0.0.1 6200 4 Export c7cdd =
192.168.4.23 6200
6200 192.168.4.24

Server connections:
ID CONNECTIONADDRESS PORT FLAGS SENDQ REF WSAQ
----------------------------------------------------------------------
6 127.0.0.1 6200090026 00000 001
Client connections:
ID CONNECTIONADDRESS PORT FLAGS SENDQ REF SUB W
----------------------------------------------------------------------
1 internal 0 01008a 00000 001 002
2 127.0.0.1 610001001a 00000 001 001
5 127.0.0.1 610001001a 00000 001 000
Request 127.0.0.1 6100 03201a 00000 001 000

We can see that the node IP address has been correctly displayed, and then we can query the ons process, which has been reduced to about two, and the problem is completely solved.

[Root @ rac01 ~] # Ps-eL | grepons | wc-l

2

Key knowledge points

1. PS to view the process command, pay attention to the difference between-and no-, for example, we want to view all processes, it should be ps aux, And if ps-aux is used, it cannot display all processes, because:

Parameter description:

-A displays the processes executed under all terminals, except the phase operation leaders.
 A displays all processes under the current terminal, including those of other users.
-E
Show all processes.
E. display the environment variables used by each process when listing processes.

During the 2.11gr2 RAC implementation, remember to remove the host names in the 127.0.0.1 column in the hosts file. Otherwise, a large number of ons processes will occur.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.