Procwatcher:script to Monitor and examine Oracle DB and Clusterware Processes (document ID 459694.1)

Source: Internet
Author: User
Tags dbx file info stack trace cpu usage oracle solaris

Applies To:

Oracle database-enterprise edition-version 10.2.0.2 to 12.1.0.1 [Release 10.2 to 12.1] Linux x86 HP-UX Pa-risc (64-bit ) IBM AIX on POWER Systems (64-bit) Oracle Solaris on SPARC (64-bit) HP-UX Itanium Linux x86-64 Oracle Server Enterprise E dition-version:10.1 to 11.2

Purpose

Procwatcher is a tool to examine and monitor Oracle database and/or clusterware processes at an interval. The tool would collect stack traces of these processes using Oracle tools like Oradebug short_stack and/or OS debuggers lik E Pstack, GDB, dbx, or ladebug and collect SQL data if specified.

If there is any problems with the prw.sh script or if you have suggestions, please post a comment on this document WI Th details and e-mail[email protected]With the word "procwatcher" on the subject line. Scope

This tool was for Oracle representatives and DBAs looking to troubleshoot a problem further by monitoring processes. This tool can be used in conjunction with other tools or troubleshooting methods depending on the situation.

Details

# This script would find clusterware and/or Oracle Background processes and collect # stack traces for debugging. It would write a file called Procname_pid_date_hour.out # for each process. If you is debugging Clusterware then run this script as root. # If You is only debugging Oracle background processes then can run as # root or Oracle.

To install the script, simply download it put it in its own directory, unzip it, and give it execute permissions. Use the following link to download it:

DOWNLOAD Procwatcher

Alternatively, you can download Procwatcher and other recommended support tools from the following article:

RAC and DB support Tools Bundle note:1594347.1

Note:if had a previous version installed, stop it prior to putting the new version on place.

If you is in a clustered environment, you can "deploy" procwatcher with "prw.sh deploy" to register with the Clusterware, Propagate to all nodes, and start on all nodes. There is also a deinstall option to deregister from the Clusterware and remove the Procwatcher directory. In a clustered environment, procwatcher files would be written to grid_home/log/procwatcher unless the Prwdir parameter is Set.

Requirements
    • Must have/bin And/usr/bin in your $PATH
    • There are your instance_name or db_name set in the Oratab and/or set the $ORACLE _home env variable. (PRW searches the oratab for the SID it finds and if it can ' t find the SID in the Oratab it would default to $ORACLE _home). Procwatcher cannot function properly if it cannot find $ORACLE _home to use.
    • Run Procwatcher as the Oracle software owner if you is only troubleshooting homes/instances for that user. If you are troubleshooting Clusterware processes (examine_cluster=true or is troubleshooting for multiple Oracle users) r Un as root.
    • If You is monitoring the clusterware you must has the relevant OS debugger installed on your platform; PRW looks for:

Linux-/USR/BIN/GDB HP-UX and HP Itanium-/opt/langtools/bin/gdb64 or/usr/ccs/bin/gdb64 Sun-/usr/bin/pstack IBM AIX- /bin/procstack or/bin/dbx HP Tru64-/bin/ladebug

It'll use Pstack on any platform where it's available besides Linux (since Pstack is a wrapper script for gdb anyway).

Procwatcher Features
  • Procwatcher collects stack traces for all processes defined using either Oradebug short_stack or an OS debugger at a Prede Fined interval if contentioin is found.
  • PRW'll generate wait chain, session wait, lock, and latch reports if problems is detected (look for pw_* reports in the Prw_db_subdirectory).
  • PRW would look for wait chains, wait events, lock, and latch contention and also dump stack traces of processes that is EI Ther waiting for Non-idle wait events or waiting for or holding a lock or latch.
  • PRW would dump wait chain, session wait, lock, latch, current SQL, process memory, and session history information to SPE Cific process files (look for prw_* files on the prw_db_subdirectory) for any processes or background processes when probl EMS is detected.
  • Can define how aggressive PRW are about getting information by setting parameters like throttle, idlecpu, and INTERVAL. You can tune these parameters to either get the most information possible or to reduce PRW ' s CPU impact. See below for more information on what's each of the these parameters does.
  • If CPU usage gets too high in the machine (as defined by IDLECPU), PRW'll sleep and wait for CPU utilization to go down.
  • Procwatcher gets the stack traces of all threads of a process (this is the important for clusterware processes).
  • The housekeeper process runs on a 5 minute loops and cleans up files older than the specified number of days (default is 7) .
  • If any, SQL times out-seconds (by default) it'll be disabled. At a later time the SQL can is re-tested. If the SQL times out 3 times it'll be disabled for the life of Procwatcher. Any gv$ view this times out would automatically revert to the corresponding v$ view. Note The gv$ view timeout is much lower. The logic is:it ' s not worth using gv$ views if they aren ' t fast ... If Oradebug shortstack is enabled and it times out or fails, the housekeeper process would re-enable shortstack if the test Passes.

Disclaimer, especially if you be monitoring Clusterware with Examine_cluster=true (default is False) or if Fall_back_ To_osdebugger=true (default is False):Most OS debuggers would temporarily suspend a process when attaching and dumping a stack trace. Procwatcher minimizes the amount of time that takes as much as possible. Some debuggers can also be CPU intensive. The throttle,; IDLECPU, and INTERVAL parameters (see below) could need to being adjusted to suit your needs depending on how loaded the Machin E is and what fast it is. Note that some debuggers is faster and can get in and out of a process quicker than others. ; For example, Pstack and Oradebug short_stack be fast, ladebug is slower.
If you is on HP Itanium or hp-ux:apply The fix for bug:10158006 (or bug:10287978 on 11.2.0.2) before monitoring th  E database with Procwatcher to fix a known short stack issue on HP. See note:1271173.1 for more information.
If you is on Solaris 10:apply the fix for Solaris BT 6994922 (see BUG:15677306) before monitoring the database with Procwatcher.

Procwatcher is Ideal for:
    • Session level hangs or severe contention in the database/instance. See note:1352623.1
    • Severe performance issues. See note:1352623.1
    • Instance evictions and/or DRM timeouts.
    • Clusterware or DB processes stuck or consuming high CPUs (must set examine_cluster=true and run as root for Clusterware Pro cesses)
    • ORA-4031 and SGA memory management issues. (Set Sgamemwatch=diag or sgamemwatch=avoid4031 (not the default). See note:1355030.1
    • ORA-4030 and DB process memory issues. (Set use_sql=true and Process_memory=y).
    • RMAN slowness/contention during a backup. (Set use_sql=true and Rmanclient=y).

Procwatcher isn't Ideal for ...
    • Node Evictions/reboots. In order to troubleshoot these you would has to enable procwatcher for a process (es) that is capable of rebooting the MA Chine. If the OS debugger suspends the processs for too long *that* could cause a reboot of the machine. I would only use Procwatcher for a node eviction/reboot if the problem is reproducing on a test system and I didn ' t care Of the node got rebooted. Even in, the INTERVAL would need to is set low (a) and many options would has to is turned off to get the cycle Time Low Enough (Examine_bg=false, use_sql=false, probably removing additional processes from the Clusterprocs list).
    • Non-severe database performance issues. Awr/addm/statspack is better options for this ...
    • Most installation or upgrade issues. We aren ' t getting data for this unless we is at a stage of the installation/upgrade where key processes is already start Ed.

Procwatcher User Commands

To start Procwatcher:

./prw.sh Start

Or If you want to start on all nodes in a clustered environment:

./prw.sh Start All

To stop Procwatcher::

./prw.sh Stop

Or If you want to stop on all nodes in a clustered environment:

./prw.sh Stop All

To check the status of Procwatcher:

./prw.sh Stat

To the Procwatcher files to upload to support:

./prw.sh Pack

All user syntax available:

./prw.sh Help
Usage:prw.sh
Verbs is:
Deploy-register Procwatcher in Clusterware and propagate to all nodes start (All)-Start procwatcher on local node, if ' All ' are specified, start on all nodes stop "all"-stop procwatcher on local node, if ' All ' are specified, stop on all nod Es stat-check The current status of Procwatcher pack-package up Procwatcher files (on all nodes) to upload to support Param-check current Procwatcher parameters Deinstall-deregister Procwatcher from Clusterware and remove log [number]- See the last [number] lines of the Procwatcher log file log [runtime]-see contiuous procwatcher log file Info-use Cnt Rl-c to break Help-what your is looking at ...

Procwatcher parameters######################### CONFIG SETTINGS ############################# # Set EXAMINE_CLUSTER Variable if want to examine Clusterware processes (default was false-or set to TRUE): # Note that if the is set to T Rue you must deploy/run Procwatcher as root unless using Oracle restart Examine_cluster=false
# set EXAMINE_BG variable if you want to examine all BG processes (default was True-or Set to false): Examine_bg=true
# Set permissions on Procwatcher files and directories (default:777): prwperm=777
# Set RETENTION variable to the number of days you want to keep historical procwatcher data (default:7) retention=7
# Warning e-mails is sent to which e-mail addresses? # "Mail" must work on the UNIX server # Example: [email protected],[email protected] warningemail= ####################### # performance SETTINGS ######################### # Set Inverval to the number of seconds between runs (default): # Prob ably should not set below if Examine_cluster=true interval=60
# Set Throttle to the max # of Stacks trace sessions or SQLS to run at once (default 5-minimum 2): throttle=5
# Set Idlecpu to the percentage of idle CPU remaining before PRW sleeps (default 3-which means PRW would sleep if the Mac Hine is more than 97% Busy-check Vmstat every 5 seconds) idlecpu=3
# Set Sidlist to the list of SIDs want to examine (default is Derived-format example: "rac1| asm1| SID3 ") # If setting for multiple instances for the same DB, specify each sid-example:" asm1| asm2| ASM3 "# Default:if root is starting PRW, get all SIDs found running at the time PRW was started. # If Another user is starting PRW, get all SIDs found running owned by that user. sidlist= #######################################################################

Advanced parameters# procwatcher Log directory # Default is $GRID _home/log/procwatcher if clusterware are running and this Is isn't set # Default is the directory where prw.sh are run if no clusterware and this is not set # example:prwdir=/home/or Acle/procwatcher prwdir=
# sql Control # set use_sql variable if you want to the use of SQL to troubleshoot (default was true-or set to false): Use_sql=t Rue # Set to ' y ' to enable SQL, ' n ' to disable sessionwait=y lock=y latchholder=y gesenqueue=y waitchains=y rmanclient=n p Rocess_memory=n sqltext=y ash=y
# SGA Memory Watch (Default:off). Valid values are: # off = no SGA memory Diagnostics # DIAG = collect SGA Memory Diagnostics # avoid4031 = Collect SGA Memo RY Diagnostics and flush the shared pool to avoid ORA-4031 # if memory fragmentation occurs # Note that settin G Sgamemwatch to ' diag ' or ' avoid4031 ' would query X$ksmsp # which may increase GKFX pool latch contention in some enviro Nments. # "Keep This" and "Test in a" test environment # with the load before using this setting in production. Sgamemwatch=off
# Levels for debugging before a flush if sgamemwatch=avoid4031 (default:0 for both) heapdump_level=0 Lib_cache_dump_level =0
# Suspect Process Threshold (if # of Suspect procs > <value> then collect BG Process stacks) # 1 = Get Query and Stack output If there is at least 1 suspect proc (default) # 0 = Get all diags each cycle suspectprocthreshold=1
# Warning Process Threshold (if # of suspect procs > <value> then issue a Warning) default=10 Warningprocthreshol d=10
# Levels for debugging if Warningprocthreshold are reached (default:0 for both) # If using this feature recommended values Is (hanganalyze_level=3, systemstate_level=258) # Flood control limits the dumps to a maximum of 3 per hour Hanganalyze_ Level=0 systemstate_level=0
# Cluster Process list for examination (seperated by "|"): # Default: "Crsd.bin|evmd.bin|evmlogge|racgimon|racge|racgmain |racgons.b|ohasd.b|oraagent|oraroota|gipcd.b|mdnsd.b|gpnpd.b|gnsd.bi|diskmon| Octssd.b|ons-d|tnslsnr "#-the Processes Oprocd, cssdagent, and Cssdmonitor is intentionally left off the list because of high reboot danger.  #-The Ocssd.bin process is off the list due to moderate reboot danger. Only add this if your CSS Misscount was the #-default or higher, your machine isn't highly loaded, and you are aware of The tradeoff. Clusterprocs= "crsd.bin|evmd.bin|evmlogge|racgimon|racge|racgmain|racgons.b|ohasd.b|oraagent|oraroota|gipcd.b| mdnsd.b|gpnpd.b| Gnsd.bi|diskmon|octssd.b|ons-d|tnslsnr "
# DB Process list for examination (seperated by "|"): # Default: "_dbw|_smon|_pmon|_lgwr|_lmd|_lms|_lck|_lmon|_ckpt|_arc| _rvwr|_gmon|_lmhb|_rms0 "#-to examine all Oracle DB and ASM processes on the machine, set bgprocs= ' ora|asm ' (not typical Ly recommended) bgprocs= "_dbw|_smon|_pmon|_lgwr|_lmd|_lms|_lck|_lmon|_ckpt|_arc|_rvwr|_gmon|_lmhb|_rms0"
# set to ' Y ' to enable gv$views, set to ' n ' to disable gv$ views # (makes queries a little faster in RAC but can ' t see oth ER instances in reports) # Default are derived based on if waitchains are used use_gv=
# Set to ' y ' to get pmap data for clusterware processes. # only available on Linux and Solaris use_pmap=n
# DB Versions enabled, set to ' Y ' or ' n ' (this would override the Sidlist setting) version_10_1=y version_10_2=y Version_11 _1=y version_11_2=y
# Should we fall back to an OS debugger if Oradebug short_stack fails? # OS debuggers is less safe per bug 6859515 so default was false (or set to True) Fall_back_to_osdebugger=false
# Number of Oradebug shortstacks to get on each pass # would automatically lower if stacks is taking too long stackcount=3
# Point the-a custom. sql file for Procwatcher to capture every cycle.  # Don ' t use big or long running SQL. The. sql file must be executable. # only 1 SQL per file. # example:customsql1=/home/oracle/test.sql customsql1= customsql2= customsql3=references

Note:783456.1-crs Diagnostic Data gathering:a Summary of Common tools and their Usage note:1352623.1-how to Troublesh Oot Database contention with Procwatcher note:1355030.1-how to troubleshoot ORA-4031 's and Shared Pool issues with PROCW Atcher note:1271173.1-process hangs after issuing oradebug short_stack on HP platforms
Note:1353073.1-exadata Diagnostic Collection Guide note:559339.1-diagnostic Tools Catalog note:1389167.1-get proacti ve with Oracle database note:1428210.1-troubleshooting database contention with V$wait_chains Note:396940.1-troublesho Oting and diagnosing ORA-4031 Error [Video] note:1477599.1-best practices:proactive Data Collection for performance Iss UEs

note:430473.1-ora-4031 Common analysis/diagnostic Scripts [Video] note:1096952.1-master NOTE for Real application Clu Sters (RAC) Oracle Clusterware and Oracle Grid Infrastructure note:452358.1-how to Collect diagnostics for Database Hang ing issues Note:1594347.1-rac and DB support Tools Bundle

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.