Oracle Case Study: solutions to problems with cpu and memory usage by OEMs

Source: Internet
Author: User
Tags metalink

I. Introduction

Since dba left the company, I managed all oracle database servers in part-time. I logged on to a database in a certain province today and found that I had logged on to the database about 30 seconds ago. Then I checked the load and memory, for example, load:

650) this. width = 650; "border =" 0 "alt =" "src =" http://www.bkjia.com/uploads/allimg/131229/195R32S0-0.jpg "/>

I have never seen such a high load, but I have seen a maximum of more than 1000 tasks. The problematic memory of java is as follows:

650) this. width = 650; "border =" 0 "alt =" "src =" http://www.bkjia.com/uploads/allimg/131229/195R353T-1.jpg "/>

Even if the swap memory is used up, the physical memory will be 71mb, which is too dangerous:

650) this. width = 650; "border =" 0 "alt =" "src =" http://www.bkjia.com/uploads/allimg/131229/195R364B-2.jpg "/>

Six zombie processes and a large number of perl processes are found. Now let's look at the zombie process.

650) this. width = 650; "border =" 0 "alt =" "src =" http://www.bkjia.com/uploads/allimg/131229/195R3I13-3.jpg "/>

It was found that all the processes were [sh] <defunct>. The previous problems were caused by no error input to a null device when the script was started in cron, the solution is to run the script in cron, add>/dev/null 2> & 1, and check cron to see if it is consistent with my idea.

650) this. width = 650; "border =" 0 "alt =" "src =" http://www.bkjia.com/uploads/allimg/131229/195R31018-4.jpg "/>

There is no error output. After adding>/dev/null 2> & 1, restart the cron server to solve the problem of viewing the perl process.

650) this. width = 650; "border =" 0 "alt =" "src =" http://www.bkjia.com/uploads/allimg/131229/195R33J3-5.jpg "/>

2726 processes were found to occupy a large amount of cpu and memory to be viewed in metalink. It was found that this problem was caused by an oem fault. The oracle Problem description and solution are as follows:
 
 
  1. Server Has 100% Of Cpu Because Of Dbresp.pl [ID 764140.1]                 
  2.  
  3.       
  4.  
  5. ________________________________________  
  6.  Modified:07-Feb-2012 Type:PROBLEM Status:MODERATED Priority:3             
  7.                      Comments (0)   
  8.      To Bottom   
  9.  
  10.  
  11.    
  12.  
  13. In this Document  
  14. Symptoms  
  15. Cause  
  16. Solution  
  17. References  
  18. ________________________________________  
  19. This document is being delivered to you via Oracle Support's Rapid Visibility (RaV) process and therefore has not been subject to an independent technical review.  
  20. Applies to:   
  21. Enterprise Manager Base Platform - Version: 10.2.0.1 and later [Release: 10.2 and later ]  
  22. Information in this document applies to any platform.  
  23. ***Checked for relevance on 07-Feb-2012***   
  24. Symptoms  
  25. Server has 100% of CPU because of dbresp.pl . There are more than 50 process from this script  
  26.  
  27. emagent.trc shows:  
  28. 2009-01-21 10:19:50 Thread-4099931040 WARN engine: Missing Properties : [limitSwitch]   
  29. 2009-01-21 10:19:50 Thread-4099931040 ERROR engine: [oracle_database,orcl, alertLog] : nmeegd_GetMetricData failed : Missing Properties : [limitSwitch]   
  30. 2009-01-22 06:54:33 Thread-4105165728 ERROR fetchlets.oslinetok: Metric execution timed out in 600 seconds   
  31. 2009-01-22 06:54:33 Thread-4105165728 ERROR command: failed to kill process 4793 running perl: (errno=3: No such process)   
  32. 2009-01-22 06:54:33 Thread-4105165728 ERROR engine: [oracle_database,orlc, Response] : nmeegd_GetMetricData failed : Metric execution timed out in 600 seconds   
  33. Cause  
  34. The Response metric is making a timed out then the Agent starts other process to take the Response metric. The process to kill the PID taking the Response metric is failing increasing the process running dbresp.pl  
  35.  
  36. Before the Response metric starts to do the timed out there is other error:  
  37. 2009-01-21 10:19:50 Thread-4099931040 WARN engine: Missing Properties : [limitSwitch]  
  38. 2009-01-21 10:19:50 Thread-4099931040 ERROR engine: [oracle_database,orcl,alertLog] :  
  39. nmeegd_GetMetricData failed : Missing Properties : [limitSwitch]  
  40. Solution  
  41. 1. Stop DBConsole  
  42.  
  43. emctl stop dbconsole  
  44.  
  45. 2. Kill any running process.  
  46.  
  47. ps -ef | grep /opt/app/oracle/
  48.  
  49. Kill any returned process.  
  50.  
  51. 3. Follow fix  
  52.  
  53. Note.361612.1 Ext/Mod Problem Performance Agent High CPU Consumption Gen  
  54.  
  55. 4. Start DB Console  
  56.  
  57. emctl start dbconsole  
  58.  
Ii. Based on this solution, I will first disable oem. before closing, I will first introduce the environment system version of my system and database:
 
 
  1. oracleserver:~ # cat /etc/SuSE  
  2. SuSE-release  SuSEconfig/     
  3. oracleserver:~ # cat /etc/SuSE-release   
  4. SUSE Linux Enterprise Server 10 (x86_64)  
  5. VERSION = 10 
  6. PATCHLEVEL = 3 
The database version is
 
 
  1. SQL> select * from v$version;  
  2.  
  3. BANNER  
  4. ----------------------------------------------------------------  
  5. Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bi  
  6. PL/SQL Release 10.2.0.1.0 - Production  
  7. CORE    10.2.0.1.0  Production  
  8. TNS for Linux: Version 10.2.0.1.0 - Production  
  9. NLSRTL Version 10.2.0.1.0 - Production  
1. First log on to the oracle user, and then close the oem
 
 
  1. oracleserver:~ # su - oracle  
  2. oracle@oracleserver:~> id  
  3. uid=1000(oracle) gid=1000(oinstall) groups=1000(oinstall),1001(dba)  
  4. oracle@oracleserver:~> emctl stop dbconsole  
  5. TZ set to Asia/Shanghai  
  6. Oracle Enterprise Manager 10g Database Control Release 10.2.0.1.0    
  7. Copyright (c) 1996, 2005 Oracle Corporation.  All rights reserved.  
  8. http://oracleserver.site:1158/em/console/aboutApplication  
  9. Stopping Oracle Enterprise Manager 10g Database Control ...   
  10.  ...  Stopped.  
Note that when the oem is disabled, there are no prompts at the beginning, and there are no prompts for viewing system logs or oracle alarm logs, however, you still need to wait patiently. I completed this step in 30 minutes. If you find no prompt after running the command, I recommend that you wait for a while and stop the command by pressing ctrl + c without any prompt. 2. Killing the perl process oem is disabled. Let's check the memory and perl process.

650) this. width = 650; "border =" 0 "alt =" "src =" http://www.bkjia.com/uploads/allimg/131229/195R36342-6.jpg "/>

There are still 2726, with no changed memory

650) this. width = 650; "border =" 0 "alt =" "src =" http://img1.51cto.com/attachment/201206/113326232.jpg "/>

When 55 m is idle, we will kill the perl process and use kill-9 $ (ps-ef | grep perl | grep-v grep | awk '{print $2 }')

650) this. width = 650; "border =" 0 "alt =" "src =" http://www.bkjia.com/uploads/allimg/131229/195R31059-8.jpg "/>

View the perl Process

650) this. width = 650; "border =" 0 "alt =" "src =" http://www.bkjia.com/uploads/allimg/131229/195R32P6-9.jpg "/>

Now the perl process has no memory.

650) this. width = 650; "border =" 0 "alt =" "src =" http://www.bkjia.com/uploads/allimg/131229/195R363V-10.jpg "/>

Now the memory has 6673m. Check the load again.

650) this. width = 650; "border =" 0 "alt =" "src =" http://www.bkjia.com/uploads/allimg/131229/195R32942-11.jpg "/>

Now the load is normal. The load in one minute is 3.15, the load in five minutes is 242.76, and the load in 15 minutes is 1236.57. Although the load is 3, the load in my server is 16 cores, no server cpu cores for all loads of 3

650) this. width = 650; "border =" 0 "alt =" "src =" http://www.bkjia.com/uploads/allimg/131229/195R35401-12.jpg "/>

Now the problem is solved. If you want to enable oem to monitor oracle, you can use emctl start dbconsole for oracle users. Tip: Many database faults. When solving these problems, I suggest you first determine how the problem is generated and find the solution. If you have a metalink account, it is best to log in and search for the cause and solution of the problem. It is not recommended to search Baidu or Google for a solution, because the answers to many questions searched by Baidu or Google are not necessarily accurate or suitable for you. If a problem occurs in your production database, you should solve the problem according to the solutions in Baidu or the fault, at the same time, if you do not understand the cause of the problem and the solution ideas and methods, you can only solve the problem with luck and solve it well, if the problem is not solved or even worse, it is estimated that you are not far from leaving.

This article is from the "Yin-Technical Exchange" blog, please be sure to keep this source http://dl528888.blog.51cto.com/2382721/911535

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.