Shell scripts monitor resources occupied by Nagios plug-ins

Source: Internet
Author: User
In general, we only need to monitor the program process. But this time we encountered such a problem, the company developed the program, the program process is still in progress, but the deadlock. This has caused a wide range of impact. What's even worse is that I don't know where the problem is, or what other test colleagues have helped me find out. I really lost my O & M face. & hellip; to avoid this situation, we analyzed the deadlock of the process and found that the deadlock will occupy 100% of the cpu, and normally only occupy less than 10%. Decided to write the nagios plug-in for monitoring

In general, we only need to monitor the program process. But this time we encountered such a problem, the company developed the program, the program process is still in progress, but the deadlock. This has caused a wide range of impact. What's even worse is that I don't know where the problem is, or what other test colleagues have helped me find out. it's really a waste of O & M faces...

To avoid this situation, we analyzed the deadlock of the process and found that the deadlock will occupy 100% of the cpu, and normally only occupy less than 10%. Decided to write the nagios plug-in to monitor the resources occupied by the program, including cpu and memory.

 

1. shell script requirement analysis:

You can set the cpu and mem thresholds. if the resource usage exceeds the threshold, an alarm is triggered.

Determine whether the process exists. if one does not exist, an alarm is triggered.

 

2. the shell script execution result is as follows:

1. if the input format is incorrect, the help information is output.

[Root @ center230 libexec] # shcomponent_resource.sh

Usage parament:

Component_resource.sh [-- cpu] [-- mem]

 

Example:

Component_resource.sh -- cpu 50 -- mem 50

 

2. if the threshold is not exceeded, the output resource usage is 0.

[Root @ center230 libexec] # shcomponent_resource.sh -- cpu 50 -- mem 50

VueSERVER_cpu_use = 5.6% bytes = 1.9% bytes = 0.0% VueCenter_cpu_use = 0.0% bytes = 0.0%; VueSERVER_mem_use = 0.2% VueCache_mem_use = 7.4% bytes = 0.5% VueCenter_mem_use = 0.1% bytes = 0.0% bytes

[Root @ center230 libexec] # echo $?

0

 

3. if the threshold is exceeded and the resource usage is output, the exit value is 2.

[Root @ center230 libexec] # shcomponent_resource.sh -- cpu 5 -- mem 5

VueSERVER_cpu_use = 9.4% bytes = 0.0% bytes = 0.0% VueCenter_cpu_use = 0.0% bytes = 0.0%; VueSERVER_mem_use = 0.2% VueCache_mem_use = 7.4% bytes = 0.5% VueCenter_mem_use = 0.1% bytes = 0.0% bytes

[Root @ center230 libexec] # echo $?

2

 

4. if the process does not exist, output the down process and the process resources in normal use. the exit value is 2.

[Root @ yckj scripts] # sh component_resource.sh -- cpu 50 -- mem 50

Current VueDaemon VueCenter VueAgent VueCache VueSERVER is down.

[Root @ yckj scripts] # echo $?

2

3. the Shell script code is as follows:
  [root@center230 libexec]# catcomponent_resource.sh #!/bin/sh #author:yangrong #date:2014-05-20 #mail:10286460@qq.com   #pragrom_list=(VueDaemon VueCenter VueAgentVueCache VueSERVER VUEConnector Myswitch Slirpvde) pragrom_list=(VueDaemon VueCenter VueAgentVueCache VueSERVER)   #### Obtain the cpu and mem thresholds #######case $1 in --cpu)    cpu_crit=$2   ;;  --mem)    mem_crit=$2   ;; esac  case $3 in --cpu)    cpu_crit=$4   ;;  --mem)    mem_crit=$4   ;; esac      ### Determine the parameter quantity. if not 4, the var value is 1, and var0 is normal ####if [[ $1 == $3  ]];then       var=1    elif [ $# -ne 4 ] ;then        var=1 else       var=0 fi    ### Print error messageif [ $var -eq 1 ];then   echo "Usage parament:"   echo "    $0 [--cpu][--mem]"   echo ""   echo "Example:"   echo "    $0 --cpu 50 --mem50"   exitfi    ### Put a nonexistent process in a variablenum=$(( ${#pragrom_list[@]}-1 ))   NotExist=""for digit in `seq 0 $num` do a=`ps -ef|grep -v grep |grep ${pragrom_list[$digit]}|wc -l`   if[ $a -eq 0 ];then    NotExist="$NotExist ${pragrom_list[$digit]}"    unset pragrom_list[$digit]   fidone#echo"pragrom_list=${pragrom_list[@]}"       #### Compare the resources and thresholds occupied by processescpu_use_all=""mem_use_all=""compare_cpu_temp=0 compare_mem_temp=0 for in ${pragrom_list[@]} do  cpu_use=`top -b -n1|grep $n|awk '{print $9}'`   mem_use=`top -b -n1|grep $n|awk '{print $10}'`    if[[ $cpu_use == "" ]];then       cpu_use=0    fi   if[[ $mem_use == "" ]];then       mem_use=0    fi    compare_cpu=`echo "$cpu_use > $cpu_crit"|bc`   compare_mem=`echo "$mem_use > $mem_crit"|bc`      if[[ $compare_cpu == 1  ]];then       compare_cpu_temp=1    fi   if[[ $compare_mem == 1  ]];then       compare_mem_temp=1    fi    cpu_use_all="${n}_cpu_use=${cpu_use}% ${cpu_use_all}"  mem_use_all="${n}_mem_use=${mem_use}% ${mem_use_all}"done    ### If the variable has a value, the process is down. The exit value is 2.if [[ "$NotExist" != ""]];then echo -e "Current ${NotExist} isdown.$cpu_use_all;$mem_use_all" exit 2 ### If the cpu comparison value is 1, it indicates that a process occupies more than the threshold value, and the exit value is 2elif [[ "$compare_cpu_temp" == 1]];then   echo -e "$cpu_use_all;$mem_use_all"   exit 2   # If the mem comparison value is 1, it indicates that the process mem usage exceeds the threshold, and the exit value is 2elif [[ $compare_mem_temp == 1 ]];then   echo -e "$cpu_use_all;$mem_use_all"   exit 2 # Otherwise, the system outputs normally and the proportion of cpu to memory occupied by the outputelse   echo -e "$cpu_use_all;$mem_use_all"   exit 0 fi
4. post:

As more and more shell scripts are written recently, sometimes it is inevitable to change the previously written scripts, which can be understood only after a while.

To facilitate subsequent maintenance, every function and each function in the script should be noted to facilitate maintenance by yourself or others.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.