Cluster Health Monitor (CHM) is a tool provided by Oracle to automatically collect operating system resources (such as CPU, memory, SWAP, process, I/O, and network). CHM collects data every second, and 11.2.0.3 collects data every 5 seconds.
These system resource data is very helpful for diagnosing cluster system node restart, Hang, instance Eviction, and performance problems. CHM is also used to detect problems such as high system load and memory exceptions early, so as to avoid more serious problems.
CHM is automatically installed in the following software:
11.2.0.2 and later versions of Oracle GridInfrastructure for Linux (not including Linux Itanium), Solaris (iSCSI 64 and x86-64)
11.2.0.3 and later versions: Oracle GridInfrastructure for AIX and Windows (excluding Windows Itanium ).
In the cluster, you can run the following command to view the status of the resource (ora. crf) corresponding to CHM:
$ Crsctl stat res ora. crf-init
CHM mainly includes two services:
1 ). systemMonitor Service (osysmond): this service runs on all nodes. osysmond sends the resource usage of each node to the cluster logger Service, the latter will receive and save the information of all nodes to the CHM database.
2). Cluster Logger Service (ologadh): In a Cluster, ologadh has a master node and a standby node (standby ). When ologadh cannot be started on the current node, it will be enabled on the slave node. (This concept is different from that of DRM master)
CHM Repository: used to store collected dataBy default, it is stored in Grid Infrastructure home and requires 1 GB of disk space. Each node occupies about GB of space every day. You can use OCLUMON to adjust its storage path and allowed space (data can be stored for up to three days ).
Close and enable CHM (it is best to use the grid account to execute the command on two nodes respectively)
Close:
Ora11grac1
Ora11grac2
Enable:
Ora11grac1
Ora11grac2
Note:
1. I/O decreases slowly after the Service is Disabled
2. This modification only applies to this time. Restarting the DB or crs service is invalid.
3. Whether to enable or not depends on the device IO in the production environment. The tester can disable it directly.
Enable and disable CHM:
# <GRID_HOME>/bin/crsctl modify resource ora. crf-attr "AUTO_START = never"-init
# <GRID_HOME>/bin/crsctl modify resource ora. crf-attr "AUTO_START = always"-init
Reference: MOS document: Cluster Health Monitor (CHM) FAQ (Doc ID 1328466.1)