Recently developers have introduced a new monitoring requirement that triggers an alarm when the number of errors in a log file increases (the number of error keywords in the log file increases).
I think this is a boring problem, the problem is boring, because the problem itself is limited (to dig their own hole). First of all, the log file can not grow infinitely, so when the log file because of the maintenance of any changes that occur will trigger an alarm;
Second, by detecting the error keyword or error code method may also be unreliable, such as the log may appear some not error code but the same number as the error code, so the error keyword and error code needs to be monitored simultaneously;
Again, such as the process of generating log restarts or hangs, log cutting (logrotate) and so on will cause the storage count changes, many cases will trigger false alarm, so this problem is really boring!
This is not to say that the number of times to detect this error is really complicated (it is really not simple), if you want to detect whether there is a problem big can not do so, for the program, all the problems that affect the correct execution of the program are all exceptions, as long as the exception is caught and handled correctly will clearly know where the problem is, How to fix it. Therefore, in the early stages of design, if you do not make these plans in advance, you can only use boredom to deal with it later.
Nonsense not to say, there are two methods can be easily implemented, posted here.
Method 1: Write two scripts, one continuous run, one for the monitoring software to run, where the continuous running script can be done by crontab, for the monitoring software to run the command line and script no matter how complex the logic of the process of how many must have the processing of exit run.
Method 2: Using the monitoring software to bring the diff or change, the trend is determined by the monitoring software (taking Zabbix as an example), Zabbix's trigger expression can easily collect data in various expressions to calculate the user's desired data, It saves users from writing programs or scripts to solve complex problems such as text comparisons, numerical calculations, and trend calculations.
Method 1:getdata.sh run in the background to provide data, checkdata.sh to Zabbix run, to query the data.
#!/bin/bash# name: getdata.shjavalogfile=/data/tomcat/tomcat-cstest/logs/ catalina.outpathtojavalogfile=$ (dirname $javalogfile) zabbixstatusfile=pathtojavalogfile/. zabbixstatus.catalina.outerrorkeyword=13003previoustime=$ (grep "$errorkeyword" $javalogfile | wc -l) currenttime=$ (grep "$errorkeyword" $javalogfile | wc -l) if [[ ! $previoustime -eq $currenttime ]]; thenecho 0exit 1fiwhile [[ $previoustime -eq $currenttime ]]; do# Actually, it's like crontab,while+sleep=crontabsleep. 2currenttime=$ (grep "$errorkeyword" $javalogfile | wc -l) if [[ $ currenttime -gt $previoustime ]]; thenprevioustime= $currenttimeecho 0 >>$ zabbixstatusfileelif [[ $currenttime -le $previoustime ]]; thenecho 1 >> $zabbixstatusfilefidone #!/bin/bash# name: checkdata.shjavalogfile=/data/tomcat/tomcat-cstest/logs/catalina.outpathtojavalogfile=$ (dirname $javalogfile) zabbixstatusfile=pathtojavalogfile/.zabbixstatus.catalina.outgrep "0" $zabbixstatusfileif [[ $? -eq 0 ]]; thenecho 0true > $zabbixstatusfileexit 1elseecho 1exit 0fi
Method 2: Generate data and query data are given to Zabbix to do.
# single line for zabbix# itemname: cs connection error# templatenmae: template app javalogmonitor# applicationname: javaerrorcodetextfound# triggername: cs connection error is occur# # /etc/zabbix/zabbix_ agentd.conf.d/userparameter_csconnerr.conf # /etc/zabbix/zabbix_agentd.conf.d/userparameter_cs.conf# {template app javalogmonitor:csprocess.cs.csconnerr[*].diff (0)}>0# For /bin/bash, such as centos# userparameter=csprocess.cs.csconnerr[*],javalogfile=/data/tomcat/ tomcat-cstest/logs/catalina.out;errorkeyword=13003;if [[ -f $javalogfile ]]; then echo $ (grep "$errorkeyword" $javalogfile | wc -l); exit 0; else echo 0; exit 1; fi# for /bin/sh, such as Ubuntuuserparameter=csprocess.cs.csconnerr[*],javalogfile=/data/tomcat/tomcat-cstest/logs/catalina.out;errorkeyword=13003;if test -f $javalogfile ; then echo $ (grep "$errorkeyword" $javalogfile | wc -l); Exit 0; else echo 0; exit 1; fi
--end--
This article is from "Communication, My Favorites" blog, please make sure to keep this source http://dgd2010.blog.51cto.com/1539422/1678879
Zabbix monitoring the number of errors that occur in the log of the detection program