1. Needs and ideas
Requirements: Use the shell to customize a variety of personalized alarm tools, but the need for unified management, standardized management.
Idea: Specify a script package that contains the main program, subroutine, configuration file, mail engine, output log, and so on.
Main program: As the entire script portal, is the lifeblood of the entire system.
Configuration file: is a control center that uses it to switch individual subroutines, specifying each associated log file.
Subroutine: This is the real monitoring script, used to monitor each indicator.
Mail Engine: It is implemented by a Python program that defines the server to which the message is sent, the person who sent it, and the sender's password
Output log: The entire monitoring system should have a log output.
Under Bin is the main program
Conf is the configuration file
Shares is the various monitoring scripts
Mail engine under Mail
Log is the journal
2.main.sh
#!/bin/bash#是否发送邮件的开关export send=1 //export意味着send变量在其子脚本有效#过滤ip地址,为了告知发邮件的ipexport addr=`/sbin/ifconfig |grep -A1 "ens33: "|awk ‘/inet/ {print $2}‘`dir=`pwd`#只需要最后一级目录名last_dir=`echo $dir|awk -F‘/‘ ‘{print $NF}‘`#下面的判断目的是,保证执行脚本的时候,我们在bin目录里,不然监控脚本、邮件和日志很有可能找不到if [ $last_dir == "bin" ] || [ $last_dir == "bin/" ]; then //保证当前目录在bin,因为下面执行脚本的时候会用到相对路径 conf_file="../conf/mon.conf"else echo "you shoud cd bin dir" exitfiexec 1>>../log/mon.log 2>>../log/err.log //日志信息的输出echo "`date +"%F %T"` load average"/bin/bash ../shares/load.sh#先检查配置文件中是否需要监控502if grep -q ‘to_mon_502=1‘ $conf_file; then //配置文件中502为1,则声明日志路径及执行502.sh监控脚本 export log=`grep ‘logfile=‘ $conf_file |awk -F ‘=‘ ‘{print $2}‘ |sed ‘s/ //g‘` /bin/bash??../shares/502.shfi
3.mon.conf
# #to config the options if to monitor# #定义mysql的服务器地址, port and user, passwordto_mon_cdb=0? # #0 or 1, default 0,0 not Monitor, 1 monitordb_ip=10.20.3.13db_port=3315db_user=usernamedb_pass=passwd# #httpd?? If 1 is monitored, 0 is not monitored to_mon_httpd=0 # #php If it is 1 monitor, 0 does not monitor the to_mon_php_socket=0# #http_code_502?? Need to define the path to the access log to_mon_502=1logfile=/data/log/xxx.xxx.com/access.log# #request_count?? define the log path and the domain name To_mon_request_count =0req_log=/data/log/www.discuz.net/access.logdomainname=www.discuz.net# #to config The options if to monitor## Define MySQL server address, port and user, passwordto_mon_cdb=0? # #0 or 1, default 0,0 not monitor, 1 monitordb_ip=10.20.3.13db_port= 3315db_user=usernamedb_pass=passwd# #httpd?? If 1 is monitored, 0 does not monitor to_mon_httpd=0# #php if 1 is monitored, 0 is not monitored to_mon_php_socket=0## http_code_502?? Need to define the path to the access log to_mon_502=1logfile=/data/log/xxx.xxx.com/access.log# #request_count?? define the log path and the domain name To_mon_request_count =0req_log=/data/log/www.discuz.net/access.logdomainname=www.discuz.net
The path of each log defined in the configuration file can also be defined in each of the required sub-scripts, but the path is different when the machine is many, and it is cumbersome to modify, so it is easy to put together.
4.load.sh
#! /bin/bashload=`uptime |awk -F ‘average:‘ ‘{print $2}‘|cut -d‘,‘ -f1|sed ‘s/ //g‘ |cut -d. -f1` //系统负载取整数if [ $load -gt 10 ] && [ $send -eq "1" ] //如果系统负载大于10且主脚本中声明要发邮件then echo "$addr `date +%T` load is $load" >../log/load.tmp /bin/bash ../mail/mail.sh [email protected] "$addr\_load:$load" `cat ../log/load.tmp`fiecho "`date +%T` load is $load" //这部分输出到主脚本定义的路径中
5.502.sh
#! /bin/bashd=`date -d "-1 min" +%H:%M`c_502=`grep :$d:??$log??|grep ‘ 502 ‘|wc -l` //502的信息需要在日志中找 if [ $c_502 -gt 10 ] && [ $send == 1 ]; then? ???echo "$addr $d 502 count is $c_502">../log/502.tmp? ???/bin/bash ../mail/mail.sh $addr\_502 $c_502??../log/502.tmpfiecho "`date +%T` 502 $c_502"
6.disk.sh
#! /bin/bashrm -f ../log/disk.tmpfor r in `df -h |awk -F ‘[ %]+‘ ‘{print $5}‘|grep -v Use` //F后表示可以用空格和%作为分隔符,方框后的+号表示分隔符可以重复多个;grep -v表示取反,Use表示df -h出现的表头,前提是系统语言为英语,保险起见可以在前一句定义语言为英语do? ? if [ $r -gt 90 ] && [ $send -eq "1" ]then? ? echo "$addr `date +%T` disk useage is $r" >>../log/disk.tmpfiif [ -f ../log/disk.tmp ]then? ? df -h >> ../log/disk.tmp? ? /bin/bash ../mail/mail.sh $addr\_disk $r ../log/disk.tmp? ? echo "`date +%T` disk useage is nook"else? ? echo "`date +%T` disk useage is ok"fi
7. Alarm system engine
mail.sh (do alarm convergence, not at any time to send e-mail, the problem can be restored in a timely manner do not need to send e-mail, the problem accumulated to a certain extent e-mail) content//which mail.py content to download here https://coding.net/u/aminglinux/p/ aminglinux-book/git/blob/master/d22z/mail.py
log=$1 //定义变量log为参数1t_s=`date +%s` //当前时间戳t_s2=`date -d "2 hours ago" +%s` //两小时前时间戳if [ ! -f /tmp/$log ] //如果文件不存在,则创建并将t_s2 echo到文件内容then echo $t_s2 > /tmp/$logfit_s2=`tail -1 /tmp/$log|awk ‘{print $1}‘` //赋值最后一行给t_s2,第一次运行意味着此值跟上面的t_s2一样echo $t_s>>/tmp/$log //追加当前时间戳到最后一行,意味着上一行的赋值把此值(上一次运行的)给t_s2v=$[$t_s-$t_s2] //上次的时间戳和这次的时间戳差值echo $vif [ $v -gt 3600 ] //两次出现异常的时间差大于一小时,意即没有问题(平稳运行)已经一小时以上,则发邮件;且生成新文件并echo “0”计数then ./mail.py $1 $2 $3 echo "0" > /tmp/$log.txtelse //时间差小于一小时(问题出现得较为频繁),则计数暂时不发邮件 if [ ! -f /tmp/$log.txt ] then echo "0" > /tmp/$log.txt fi nu=`cat /tmp/$log.txt` //读取文件中的数字 nu2=$[$nu+1] //数字加一 echo $nu2>/tmp/$log.txt //重新定向到文件 if [ $nu2 -gt 10 ] //计数大于十(即频繁问题出现十次以上)则发邮件,并清零计数文件 then ./mail.py $1 "trouble continue 10 min $2" "$3" echo "0" > /tmp/$log.txt fifi??
Finally add the task plan:
-
-
-
-
- Cd/usr/local/sbin/mon/bin; /bin/bash main.sh
shell-Alarm System