Nagios monitoring heartbeat, nagiosheartbeat
After heartbeat is set up, we need to monitor it. Next we will learn how to monitor it.
First, let's take a look at the following commands. These commands will be automatically added after heartbeat is installed. Our monitoring script will use these commands.
[Root @ usvr-210 libexec] # which cl_status/usr/bin/cl_status [root @ usvr-210 libexec] # cl_status listnodes # list nodes in the current heartbeat cluster 192.168.3.1usvr-211usvr-210 [root @ libusvr-210 exec] # cl_status nodestatus usvr-211 # list node status active [root @ usvr-210 libexec] # cl_status nodestatus 192.168.3.1 # list node status ping
Our principle of check_heartbeat.sh is to list all nodes in the cluster and monitor whether the status of all nodes is normal. The node status of our experiment is ping and active.
Critical when the number of active + ping instances is 0
When the number of active + ping nodes is smaller than the total number of nodes, It is warn.
OK when the number of active + ping nodes equals the total number of nodes
[Root @ usvr-210 libexec] # cat check_heartbeat.sh #! /Bin/bash # Author: Emmanuel Bretelle # Date: 12/03/2010 # Description: Retrieve Linux HA cluster status using cl_status # Based on http://www.randombugs.com/linux/howto-monitor-linux-heartbeat-snmp.html # Autor: Stanila Constantin Adrian # Date: 20/03/2009 # Description: Check the number of active heartbeats # http://www.randombugs.com # Get program pathREVISION = 1.3 PROGNAME = '/bin/basename $0' PROGPATH = 'EC Ho $0 |/bin/sed-e's, [\/] [^ \/] [^ \/] * $ ,, ''node _ NAME = 'uname-n' CL _ ST = '/usr/bin/cl_status' # nagios error codes #. $ PROGPATH/utils. sh OK = 0 WARNING = 1 CRITICAL = 2 UNKNOWN = 3 usage () {echo "\ Nagios plugin to heartbeat. usage: $ PROGNAME [-- help |-h] $ PROGNAME [-- version |-v] Options: -- help-lPrint this help information -- version-v Print version of plugin "} help () {print_revision $ PROGNAME $ REV ISION echo; usage; echo; support} while test-n "$1" do case "$1" in -- help |-h) help exit $ STATE_ OK ;; -- version |-v) print_revision $ PROGNAME $ REVISION exit $ STATE_ OK; #-H) # shift # HOST = $1; #-C) # shift # COMMUNITY = $1; *) echo "Heartbeat UNKNOWN: Wrong command usage"; exit $ UNKNOWN; esac shiftdone $ CL_ST hbstatus>/dev/nullres = $? If [$ res-ne 0] then echo "Heartbeat CRITICAL: heartbeat is not running on this node "exit $ CRITICALfideclare-I I = 0 declare-I A = 0 NODES = '$ CL_ST listnodes' for node in $ NODESdo status = '$ CL_ST nodestatus $ node 'let I = $ I + 1 # if [$ status = "active"] The number of active States detected by default, however, the ping status is also normal, so it is changed to the following conditions. If [$ status = "active"-o $ status = "ping"] then let A = $ A + 1 fidoneif [$ A-eq 0] then echo "Heartbeat CRITICAL: $ A/$ I "exit $ CRITICALelif [$ A-ne $ I] then echo" Heartbeat WARNING: $ A/$ I "exit $ WARNINGelse echo" Heartbeat OK: $ A/$ I "exit $ OKfi
In nagios clients, our lvs cluster usvr-210, usvr-211, we get monitoring information through check_nrpe on nagios servers.
Naigos Client
1. Copy the script to the nagios Command directory and modify the corresponding permissions.
Cp check_heartbeat.sh/usr/local/nagios/libexec/
Chmod a + x check_heartbeat.sh
Chown nagios. nagios check_heartbeat.sh
2. Add the monitoring command to the configuration file of the naigos client.
Vim/usr/local/nagios/etc/nrpe. cfg
Command [check_heartbeat] =/usr/local/nagios/libexec/check_heartbeat.sh
3. Reload the configuration file.
Service xinetd reload
Nagios Server
1. Add related monitoring services
define service { use local-service service_description heartbeat-lvs-master check_command check_nrpe!check_heartbeat service_groups heartbeat_services host_name usvr-210 check_interval 5 notifications_enabled 1 notification_interval 30 contact_groups admins}define service { use local-service service_description heartbeat-lvs-slave check_command check_nrpe!check_heartbeat service_groups heartbeat_services host_name usvr-211 check_interval 5 notifications_enabled 1 notification_interval 30 contact_groups admins}
2. Check and load the configuration file
Nagioscheck
Service nagios reload
Monitoring is as follows:
OK. Our heartbeat monitoring is complete.
I refer to this website.