Introduction to Oracle Cluster Health Monitor (CHM)

Source: Internet
Author: User
Tags mrtg

Oracle Cluster Health Monitor (CHM)IntroductionOverview

Cluster Health Monitor( CHM) is an Oracle -provided tool for automating the collection of operating system resources (CPU, memory, The use of SWAP, process,I/O, and network, etc.). the CHM collects data once per second.

These system resource data are for diagnosing the node reboots of the cluster system, Hang, instance eviction (eviction), performance issues, etc. are very helpful. In addition, users can use CHM to detect some problems such as high system load, memory anomalies and so on, so as to avoid more serious problems.

The CHM is automatically installed in the following software:

11.2.0.2 and later versions of Oraclegrid Infrastructure for Linux ( not including Linux Itanium) ,Solaris (Sparc and x86-64)

11.2.0.3 and later Oraclegrid Infrastructure for AIX , windows ( not including Windows Itanium).

in the cluster, the following command can be used to view the status of the CHM corresponding resource (ORA.CRF):

$ crsctl Stat res-t-init

[Email protected] bin]#/crsctl stat resora.crf-init

Name=ora.crf

Type=ora.crf.type

Target=online

State=online on TESTRAC2

The CHM consists mainly of two services:

1). System Monitor Service (osysmond): This service runs on all nodes,Osysmond sends the resource usage of each node to cluster logger service , the latter will receive and save information from all nodes to The CHM database .

$ps-ef|grep Osysmond
Root 7984 1 0jun05? 01:16:14/u01/app/11.2.0/grid/bin/osysmond.bin

2). Cluster Logger Service (ologgerd): In a cluster,Ologgerdthere will be a host point(master), there is also a standby node(standby). WhenOloggerdafter the current node encounters a problem that cannot be started, it is enabled on the standby node.

Master Node:
$ ps-ef|grep Ologgerd
Root 8257 1 0jun05? 00:38:26/u01/app/11.2.0/grid/bin/ologgerd-m-D/U01/APP/11.2.0/GRID/CRF/DB/RAC2

Standby node:
$ ps-ef|grep Ologgerd
Root 8353 1 0jun05? 00:18:47/u01/app/11.2.0/grid/bin/ologgerd-m Rac2-r-D
/u01/app/11.2.0/grid/crf/db/rac1

CHM Repository: Used to store collected data, which, by default, exists in theGrid Infrastructure Homeunder , you need1 GBof disk space, each node consumes about0.5GBof space. You can useOclumonto adjust its storage path and the amount of space allowed(You can save up to3Days of data).

View current Settings

The following command is used to view its current settings:
$ Oclumon Manage-get Reppath
CHM Repository Path =/u01/app/11.2.0/grid/crf/db/rac2
Done

$ Oclumon Manage-get repsize
CHM Repository Size = 68082 <==== Unit is seconds
Done
Modify settings

To Modify a path:

$ Oclumon Manage-repos Reploc/shared/oracle/chm
Modify Size:

$ Oclumon manage-repos Resize 68083 <== between 3600 ( hours ) to 259200 (3 days )
RAC1-Retention Check Successful
New retention is 68083 and would use1073750609 bytes of disk space
Crs-9115-cluster Health Monitor repositorysize Change completed on all nodes.
Done

GetCHMmethods of the generated data

1. one is to use grid_home/bin/diagcollection.pl:
1). First, determine the primary node of the Clusterlogger service:
$ Oclumon Manage-getmaster
Master = Rac2

2).withRootidentity on the master nodeRac2execute the following command:
#/bin/diagcollection.pl-collect-chmos-incidenttime Inc_time-incidentduration Duration
Inc_timeis the time to start getting data, in the formMM/DD/YYYY24HH:MM:SS, durationrefers to the amount of data that is obtained after the start time.

For example:#diagcollection. pl-collect-crshome/u01/app/11.2.0/grid-chmoshome/u01/app/11.2.0/grid-chmos-incidenttime06/15/ 201215:30:00-incidentduration 00:05

3).after running this command,Chmthe data is generated in the fileChmosData_rac2_20120615_1537.tar.gz.

2.another way to getChmThe method for generating the data isOclumon:
$oclumon Dumpnodeview [[-allnodes] | [-N Node1 Node2] [-last "duration"] | [-S "time_stamp"-E "time_stamp"] [-V] [-warning]] [-h]

-Sindicates the start time,-Eindicates the end time
$ Oclumon dumpnodeview-allnodes-v-s "2012-06-15 07:40:00"-E "2012-06-15 07:57:00" >/tmp/chm1.txt

$ Oclumon dumpnodeview-n node1 node2node3-last "12:00:00" >/tmp/chm1.txt
$ Oclumon dumpnodeview-allnodes-last "00:15:00" >/tmp/chm1.txt


Below is/tmp/chm1.txtpart of the content:
----------------------------------------
Node:rac1 Clock: ' 06-15-12 07.40.01 ' serialno:168880
----------------------------------------

SYSTEM:
#cpus: 1 cpu:17.96 cpuq:5 physmemfree:32240 physmemtotal:2065856 mcache:1064024 swapfree:3988376 swaptotal:4192956 I or:57 IO
w:59 ios:10 swpin:0 swpout:0 pgin:57 pgout:59 netr:65.767 netw:34.871 procs:183 rtprocs:10 #fds: 4902 #sysfdlimit : 6815744
#disks: 4 #nics: 3 nicerrors:0

TOP Consumers:
TOPCPU: ' MRTG (32385) 64.70 ' Topprivmem: ' Ologgerd (8353) 84068 ' TOPSHM: ' Oracle (8760) 329452 ' TOPFD: ' Ohasd.bin (6627) 720 ' Topthread:
' Crsd.bin (8235) 44 '

PROCESSES:

Name: ' MRTG ' pid:32385 #procfdlimit: 65536 cpuusage:64.70 privmem:1160 shm:1584 #fd: 5 #threads: 1 priority:20 nice:0
Name: ' Oracle ' pid:32381 #procfdlimit: 65536 cpuusage:0.29 privmem:1456 shm:12444 #fd: #threads: 1 priority:15 Nice : 0
...
Name: ' Oracle ' pid:8756 #procfdlimit: 65536 cpuusage:0.0 privmem:2892 shm:24356 #fd: #threads: 1 priority:16 Nice: 0

----------------------------------------
Node:rac2 Clock: ' 06-15-12 07.40.02 ' serialno:168878
----------------------------------------

SYSTEM:
#cpus: 1 cpu:40.72 cpuq:8 physmemfree:34072 physmemtotal:2065856 mcache:1005636 swapfree:3991808 swaptotal:4192956 I or:54 IO
w:104 ios:11 swpin:0 swpout:0 pgin:54 pgout:104 netr:77.817 netw:33.008procs:178 rtprocs:10 #fds: 4948 #sysfdlim it:68157
#disks: 4 #nics: 4 nicerrors:0

TOP Consumers:
TOPCPU: ' Orarootagent.bi (8490) 1.59 ' Topprivmem: ' Ologgerd (8257) 83108 ' TOPSHM: ' Oracle (8873) 324868 ' TOPFD: ' Ohasd.bin ( 6744) 720 ' t
Opthread: ' Crsd.bin (8362) 47 '

PROCESSES:

Name: ' Oracle ' pid:9040 #procfdlimit: 65536 cpuusage:0.19 privmem:6040 shm:121712 #fd: #threads: 1 priority:16 Nice : 0
...


about theChmfor more explanations, please refer toOracleOfficial documents:
http://docs.oracle.com/cd/E11882_01/rac.112/e16794/troubleshoot.htm#CWADD92242
Oracle Clusterware Administration and Deployment Guide
11g Release 2 (11.2)
Part number E16794-17

orMy Oracle SupportDocumentation:
Cluster Health Monitor (CHM) FAQ (Doc ID 1328466.1)


Introduction to Oracle Cluster Health Monitor (CHM)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.