Oracle Case Study: solutions to problems with cpu and memory usage by OEMs

Last Update:2013-12-29 Source: Internet

Author: User

Tags metalink

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. Introduction

Since dba left the company, I managed all oracle database servers in part-time. I logged on to a database in a certain province today and found that I had logged on to the database about 30 seconds ago. Then I checked the load and memory, for example, load:

650) this. width = 650; "border =" 0 "alt =" "src =" http://www.bkjia.com/uploads/allimg/131229/195R32S0-0.jpg "/>

I have never seen such a high load, but I have seen a maximum of more than 1000 tasks. The problematic memory of java is as follows:

650) this. width = 650; "border =" 0 "alt =" "src =" http://www.bkjia.com/uploads/allimg/131229/195R353T-1.jpg "/>

Even if the swap memory is used up, the physical memory will be 71mb, which is too dangerous:

650) this. width = 650; "border =" 0 "alt =" "src =" http://www.bkjia.com/uploads/allimg/131229/195R364B-2.jpg "/>

Six zombie processes and a large number of perl processes are found. Now let's look at the zombie process.

650) this. width = 650; "border =" 0 "alt =" "src =" http://www.bkjia.com/uploads/allimg/131229/195R3I13-3.jpg "/>

It was found that all the processes were [sh] <defunct>. The previous problems were caused by no error input to a null device when the script was started in cron, the solution is to run the script in cron, add>/dev/null 2> & 1, and check cron to see if it is consistent with my idea.

650) this. width = 650; "border =" 0 "alt =" "src =" http://www.bkjia.com/uploads/allimg/131229/195R31018-4.jpg "/>

There is no error output. After adding>/dev/null 2> & 1, restart the cron server to solve the problem of viewing the perl process.

650) this. width = 650; "border =" 0 "alt =" "src =" http://www.bkjia.com/uploads/allimg/131229/195R33J3-5.jpg "/>

2726 processes were found to occupy a large amount of cpu and memory to be viewed in metalink. It was found that this problem was caused by an oem fault. The oracle Problem description and solution are as follows:

 
 
  
  Server Has 100% Of Cpu Because Of Dbresp.pl [ID 764140.1]                 
  
   
  
        
  
   
  
  ________________________________________  
  
   Modified:07-Feb-2012 Type:PROBLEM Status:MODERATED Priority:3             
  
                       Comments (0)   
  
       To Bottom   
  
   
  
   
  
     
  
   
  
  In this Document  
  
  Symptoms  
  
  Cause  
  
  Solution  
  
  References  
  
  ________________________________________  
  
  This document is being delivered to you via Oracle Support's Rapid Visibility (RaV) process and therefore has not been subject to an independent technical review.  
  
  Applies to:   
  
  Enterprise Manager Base Platform - Version: 10.2.0.1 and later [Release: 10.2 and later ]  
  
  Information in this document applies to any platform.  
  
  ***Checked for relevance on 07-Feb-2012***   
  
  Symptoms  
  
  Server has 100% of CPU because of dbresp.pl . There are more than 50 process from this script  
  
   
  
  emagent.trc shows:  
  
  2009-01-21 10:19:50 Thread-4099931040 WARN engine: Missing Properties : [limitSwitch]   
  
  2009-01-21 10:19:50 Thread-4099931040 ERROR engine: [oracle_database,orcl, alertLog] : nmeegd_GetMetricData failed : Missing Properties : [limitSwitch]   
  
  2009-01-22 06:54:33 Thread-4105165728 ERROR fetchlets.oslinetok: Metric execution timed out in 600 seconds   
  
  2009-01-22 06:54:33 Thread-4105165728 ERROR command: failed to kill process 4793 running perl: (errno=3: No such process)   
  
  2009-01-22 06:54:33 Thread-4105165728 ERROR engine: [oracle_database,orlc, Response] : nmeegd_GetMetricData failed : Metric execution timed out in 600 seconds   
  
  Cause  
  
  The Response metric is making a timed out then the Agent starts other process to take the Response metric. The process to kill the PID taking the Response metric is failing increasing the process running dbresp.pl  
  
   
  
  Before the Response metric starts to do the timed out there is other error:  
  
  2009-01-21 10:19:50 Thread-4099931040 WARN engine: Missing Properties : [limitSwitch]  
  
  2009-01-21 10:19:50 Thread-4099931040 ERROR engine: [oracle_database,orcl,alertLog] :  
  
  nmeegd_GetMetricData failed : Missing Properties : [limitSwitch]  
  
  Solution  
  
  1. Stop DBConsole  
  
   
  
  emctl stop dbconsole  
  
   
  
  2. Kill any running process.  
  
   
  
  ps -ef | grep /opt/app/oracle/ 
   
  
  Kill any returned process.  
  
   
  
  3. Follow fix  
  
   
  
  Note.361612.1 Ext/Mod Problem Performance Agent High CPU Consumption Gen  
  
   
  
  4. Start DB Console  
  
   
  
  emctl start dbconsole

Ii. Based on this solution, I will first disable oem. before closing, I will first introduce the environment system version of my system and database:

 
 
  
  oracleserver:~ # cat /etc/SuSE  
  
  SuSE-release  SuSEconfig/     
  
  oracleserver:~ # cat /etc/SuSE-release   
  
  SUSE Linux Enterprise Server 10 (x86_64)  
  
  VERSION = 10 
  
  PATCHLEVEL = 3

The database version is

 
 
  
  SQL> select * from v$version;  
  
   
  
  BANNER  
  
  ----------------------------------------------------------------  
  
  Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bi  
  
  PL/SQL Release 10.2.0.1.0 - Production  
  
  CORE    10.2.0.1.0  Production  
  
  TNS for Linux: Version 10.2.0.1.0 - Production  
  
  NLSRTL Version 10.2.0.1.0 - Production

1. First log on to the oracle user, and then close the oem

 
 
  
  oracleserver:~ # su - oracle  
  
  oracle@oracleserver:~> id  
  
  uid=1000(oracle) gid=1000(oinstall) groups=1000(oinstall),1001(dba)  
  
  oracle@oracleserver:~> emctl stop dbconsole  
  
  TZ set to Asia/Shanghai  
  
  Oracle Enterprise Manager 10g Database Control Release 10.2.0.1.0    
  
  Copyright (c) 1996, 2005 Oracle Corporation.  All rights reserved.  
  
  http://oracleserver.site:1158/em/console/aboutApplication  
  
  Stopping Oracle Enterprise Manager 10g Database Control ...   
  
   ...  Stopped.

Note that when the oem is disabled, there are no prompts at the beginning, and there are no prompts for viewing system logs or oracle alarm logs, however, you still need to wait patiently. I completed this step in 30 minutes. If you find no prompt after running the command, I recommend that you wait for a while and stop the command by pressing ctrl + c without any prompt. 2. Killing the perl process oem is disabled. Let's check the memory and perl process.

650) this. width = 650; "border =" 0 "alt =" "src =" http://www.bkjia.com/uploads/allimg/131229/195R36342-6.jpg "/>

There are still 2726, with no changed memory

650) this. width = 650; "border =" 0 "alt =" "src =" http://img1.51cto.com/attachment/201206/113326232.jpg "/>

When 55 m is idle, we will kill the perl process and use kill-9 $ (ps-ef | grep perl | grep-v grep | awk '{print $2 }')

650) this. width = 650; "border =" 0 "alt =" "src =" http://www.bkjia.com/uploads/allimg/131229/195R31059-8.jpg "/>

View the perl Process

650) this. width = 650; "border =" 0 "alt =" "src =" http://www.bkjia.com/uploads/allimg/131229/195R32P6-9.jpg "/>

Now the perl process has no memory.

650) this. width = 650; "border =" 0 "alt =" "src =" http://www.bkjia.com/uploads/allimg/131229/195R363V-10.jpg "/>

Now the memory has 6673m. Check the load again.

650) this. width = 650; "border =" 0 "alt =" "src =" http://www.bkjia.com/uploads/allimg/131229/195R32942-11.jpg "/>

Now the load is normal. The load in one minute is 3.15, the load in five minutes is 242.76, and the load in 15 minutes is 1236.57. Although the load is 3, the load in my server is 16 cores, no server cpu cores for all loads of 3

650) this. width = 650; "border =" 0 "alt =" "src =" http://www.bkjia.com/uploads/allimg/131229/195R35401-12.jpg "/>

Now the problem is solved. If you want to enable oem to monitor oracle, you can use emctl start dbconsole for oracle users. Tip: Many database faults. When solving these problems, I suggest you first determine how the problem is generated and find the solution. If you have a metalink account, it is best to log in and search for the cause and solution of the problem. It is not recommended to search Baidu or Google for a solution, because the answers to many questions searched by Baidu or Google are not necessarily accurate or suitable for you. If a problem occurs in your production database, you should solve the problem according to the solutions in Baidu or the fault, at the same time, if you do not understand the cause of the problem and the solution ideas and methods, you can only solve the problem with luck and solve it well, if the problem is not solved or even worse, it is estimated that you are not far from leaving.

This article is from the "Yin-Technical Exchange" blog, please be sure to keep this source http://dl528888.blog.51cto.com/2382721/911535

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More