Preface:
During the 11 holiday, we need to monitor the server and online items, so that we can find out the problems in time and deal with them in time. Because the company's abnormal monitoring system is not perfect, deliberately on the Linux server with Python monitoring scripts to achieve simple monitoring functions.
function:
1, the disk use rate alarm function. When disk usage exceeds our defined thresholds, mail is sent to our mailbox to notify us that there will be insufficient disk space.
2, log analysis and monitoring functions. According to the key word analysis Monitoring System log, and alarm, so that the system problems can be found in time, timely treatment.
Code: log_monitor_real_time.py
#coding: Utf-8 import OS import re import smtplib import datetime import shelve from Email.mime.text import Mimetext # hard drive makes Use rate alarm threshold Hd_usage_rate_threshold = 80 # who to send to mailto_list=["******@17guagua.com", "******@17guagua.com"] # Set up the server, user name, Password and the suffix of the mailbox mail_host= "smtp.17guagua.com" mail_user= "******@17guagua.com" mail_pass= "" mail_postfix= " 17guagua.com "# log offset Log_offset = Shelve.open (' Log_offset ') # take today's date log_path_suffix= (Datetime.date.today ()). Strftime (' %y-%m-%d ') # current date Key cur_time = ' cur_time ' # log path app_info = {} app_info[' event ' = ['/opt/log/guagua_web_event_extends/ event-ext-' +log_path_suffix+ '. log ', [' failed ', ' abnormal '],[]] # processing log def analysis_log (AppName, appInfo): Cur_time_val = Get_shel Ve_value (cur_time) if cur_time_val = = -1:set_shelve_value (cur_time, log_path_suffix) elif Log_path_suffix != cur_time_val:set_shelve_value (appName, 0) set_shelve_value (cur_time, log_path_suffix) f1 = file ( Appinfo[0], "r") offset = Get_shelve_value (appName If offset!= -1:f1.seek (offset,1) else:set_shelve_value (appName, 0) Count = 0 Exce Ptionstr = "" For s in F1.readlines (): Searchkey = appinfo[1] If len (searchkey) > 0:fo R i in Searchkey:li = Re.findall (i, s) if Len (li) > 0:count = cou NT + li.count (i) Exceptionstr = Exceptionstr + "" + s Else:li = Re.findall (' Exce Ption ', s) if Len (li) > 0:count = count + li.count (' Exception ') exceptions TR = exceptionstr + "" + S Set_shelve_value (AppName, F1.tell ()) Print AppName + "Exception number is" + str (count) return
[Count, "---------------------------------" + appName + "-----------------------------\ n" + exceptionstr] #shelve processing
def set_shelve_value (key, value): Log_offset[key] = value def get_shelve_value (key): If Log_offset.has_key (key): Return LOG_OFFSET[key] else:return-1 def del_shelve_value (key): If Log_offset.has_key (key): Del Log_offset[key] # Send mail def send_mail (to_list,sub,content): me = Mail_user + "<" + Mail_user + "@" + Mail_postfix + ">" msg = MI Metext (content, ' html ', ' Utf-8 ') msg[' Subject '] = Sub msg[' from '] = Me msg[' to '] = ";". Join (to_list) try:s = Smtplib. SMTP () s.connect (mail_host) s.login (Mail_user,mail_pass) S.sendmail (Me, To_list, msg.as_string ()) S.close ( ) return True except Exception, E:print str (e) Return False # get extranet IP def get_wan_ip (): Cmd_get_ip = "/ Sbin/ifconfig |grep ' inet addr ' |awk-f\: ' {print $} ' |awk ' {print '} ' | Grep-v ' ^127 ' | Grep-v ' Get_ip_info = Os.popen (cmd_get_ip). ReadLine (). Strip () return Get_ip_info # detect HDD using DEF CHECK_HD
_use (): Cmd_get_hd_use = '/bin/df ' TRY:FP = Os.popen (cmd_get_hd_use) except:
ErrorInfo = R ' Get_hd_use_error 'Print ErrorInfo return errorinfo re_obj = Re.compile (R ' ^/dev/.+\s+ (?) p<used>\d+)%\s+ (? p<mount>.+) ' Hd_use = {} for line in Fp:match = Re_obj.search (line) if Mat
CH is not none:hd_use[match.groupdict () [' mount ']] = match.groupdict () [' Used '] fp.close () Return Hd_use # Hard drive uses alarm def hd_use_alarm (): For V in Check_hd_use (). VALUES (): if int (v) > Hd_usage_rate_thresh Old:if send_mail (mailto_list, ' System Disk Monitor ', ' Nsystem ip:%s\nsystem Disk Us
e:%s '% (Get_wan_ip (), Check_hd_use ()): print "SENDMAIL success!!!!!" Else:print "Disk not mail" if __name__ = = ' __main__ ': hd_use_alarm () exceptioncount = 0 exceptioncontents
= "";
For key in app_info:exceptioncontent = Analysis_log (key, App_info[key]) Exceptioncount + = Exceptioncontent[0] Exceptioncontents + = exceptioncontent[1] Exceptioncontents = exceptioncontents + "*********************************************** \ n" Print ExceptionCount if E Xceptioncount > 0:if send_mail (mailto_list, get_wan_ip () + log alert, exceptioncontents): print "sent successfully" El
Se:print "Send Failed"
Description:
1, set the disk alarm threshold value
2, the designated recipient (one or more)
3, incremental analysis of the log, for example, if set scheduled task every one hours, then the analysis of the file for the first one hours of file content, can be adjusted according to time
4. Specify log and keywords (one or more)
5. Specify multiple System log paths
How to use:
Script type: Python
Script path: Home directory CD ~
Need to work with Linux timed tasks (CRONTAB-E), such as 0/1 * * python/root/log_monitor_real_time.py (performed once every one hours)
Precautions :
1, the test can manually execute the script python/root/log_monitor_real_time.py, each execution will generate Log_offset files,
2, if the test again need to delete this file first, because this file records the file offset, if not deleted, read the beginning of the file will be the location of the last processed
If you want to see the effect, we will try it on their own ~ ~ ~