In general, we only need to monitor the program process. But this time we encountered such a problem, the company developed the program, the program process is still in progress, but the deadlock. This has caused a wide range of impact. what's even worse is that I don't know where the problem is, or what other test colleagues have helped me find out. It's really a waste of O & M faces...
To avoid this situation, we analyzed the deadlock of the process and found that the deadlock will occupy 100% of the cpu, and normally only occupy less than 10%. Decided to write the nagios plug-in to monitor the resources occupied by the program, including cpu and memory.
1. shell script requirement analysis:
You can set the cpu and mem thresholds. If the resource usage exceeds the threshold, an alarm is triggered.
Determine whether the process exists. if one does not exist, an alarm is triggered.
2. The shell script execution result is as follows:
1. If the input format is incorrect, the help information is output.
[Root @ center230 libexec] # shcomponent_resource.sh
Usage parament:
Component_resource.sh [-- cpu] [-- mem]
Example:
Component_resource.sh -- cpu 50 -- mem 50
2. If the threshold is not exceeded, the output resource usage is 0.
[Root @ center230 libexec] # shcomponent_resource.sh -- cpu 50 -- mem 50
VueSERVER_cpu_use = 5.6% bytes = 1.9% bytes = 0.0% VueCenter_cpu_use = 0.0% bytes = 0.0%; VueSERVER_mem_use = 0.2% VueCache_mem_use = 7.4% bytes = 0.5% VueCenter_mem_use = 0.1% bytes = 0.0% bytes
[Root @ center230 libexec] # echo $?
0
3. If the threshold is exceeded and the resource usage is output, the exit value is 2.
[Root @ center230 libexec] # shcomponent_resource.sh -- cpu 5 -- mem 5
VueSERVER_cpu_use = 9.4% bytes = 0.0% bytes = 0.0% VueCenter_cpu_use = 0.0% bytes = 0.0%; VueSERVER_mem_use = 0.2% VueCache_mem_use = 7.4% bytes = 0.5% VueCenter_mem_use = 0.1% bytes = 0.0% bytes
[Root @ center230 libexec] # echo $?
2
4. If the process does not exist, output the down process and the process resources in normal use. The exit value is 2.
[Root @ yckj scripts] # sh component_resource.sh -- cpu 50 -- mem 50
CurrentVueDaemon VueCenter VueAgent VueCache VueSERVER is down.
[Root @ yckj scripts] # echo $?
2
3. The Shell script code is as follows:
[root@center230libexec]
#catcomponent_resource.sh
#!/bin/sh
#author:yangrong
#date:2014-05-20
#mail:10286460@qq.com
#pragrom_list=(VueDaemonVueCenterVueAgentVueCacheVueSERVERVUEConnectorMyswitchSlirpvde)
pragrom_list=(VueDaemonVueCenterVueAgentVueCacheVueSERVER)
#### Obtain the cpu and mem thresholds #######
case
$1
in
--cpu)
cpu_crit=$2
;;
--mem)
mem_crit=$2
;;
esac
case
$3
in
--cpu)
cpu_crit=$4
;;
--mem)
mem_crit=$4
;;
esac
### Determine the parameter quantity. If not 4, the var value is 1, and var0 is normal ####
if
[[$1==$3]];
then
var=1
elif
[$
#-ne4];then
var=1
else
var=0
fi
### Print error message
if
[$var-
eq
1];
then
echo
"Usageparament:"
echo
"$0[--cpu][--mem]"
echo
""
echo
"Example:"
echo
"$0--cpu50--mem50"
exit
fi
### Put a nonexistent process in a variable
num=$((${
#pragrom_list[@]}-1))
NotExist=
""
for
digit
in
`
seq
0$num`
do
a=`
ps
-ef|
grep
-
v
grep
|
grep
${pragrom_list[$digit]}|
wc
-l`
if
[$a-
eq
0];
then
NotExist=
"$NotExist${pragrom_list[$digit]}"
unset
pragrom_list[$digit]
fi
done
#echo"pragrom_list=${pragrom_list[@]}"
#### Compare the resources and thresholds occupied by processes
cpu_use_all=
""
mem_use_all=
""
compare_cpu_temp=0
compare_mem_temp=0
for
n
in
${pragrom_list[@]}
do
cpu_use=`
top
-b-n1|
grep
$n|
awk
'{print$9}'
`
mem_use=`
top
-b-n1|
grep
$n|
awk
'{print$10}'
`
if
[[$cpu_use==
""
]];
then
cpu_use=0
fi
if
[[$mem_use==
""
]];
then
mem_use=0
fi
compare_cpu=`
echo
"$cpu_use>$cpu_crit"
|
bc
`
compare_mem=`
echo
"$mem_use>$mem_crit"
|
bc
`
if
[[$compare_cpu==1]];
then
compare_cpu_temp=1
fi
if
[[$compare_mem==1]];
then
compare_mem_temp=1
fi
cpu_use_all=
"${n}_cpu_use=${cpu_use}%${cpu_use_all}"
mem_use_all=
"${n}_mem_use=${mem_use}%${mem_use_all}"
done
### If the variable has a value, the process is down. The exit value is 2.
if
[[
"$NotExist"
!=
""
]];
then
echo
-e
"Current${NotExist}isdown.$cpu_use_all;$mem_use_all"
exit
2
### If the cpu comparison value is 1, it indicates that a process occupies more than the threshold value, and the exit value is 2
elif
[[
"$compare_cpu_temp"
==1]];
then
echo
-e
"$cpu_use_all;$mem_use_all"
exit
2
# If the mem comparison value is 1, it indicates that the process mem usage exceeds the threshold, and the exit value is 2
elif
[[$compare_mem_temp==1]];
then
echo
-e
"$cpu_use_all;$mem_use_all"
exit
2
# Otherwise, the system outputs normally and the proportion of cpu to memory occupied by the output
else
echo
-e
"$cpu_use_all;$mem_use_all"
exit
0
fi4. Post:
As more and more shell scripts are written recently, sometimes it is inevitable to change the previously written scripts, which can be understood only after a while.
To facilitate subsequent maintenance, every function and each function in the script should be noted to facilitate maintenance by yourself or others.