Python Monitoring Service port and alarm, python Monitoring Port
Recently, I found that the port of a Socket service in the company's testing environment is always inexplicably Down, but the service is running normally. It seems like it is stiff...
Although it is a test environment, it cannot be left alone, so I wrote a simple monitoring script overnight. Because the server is Windows, wmi module is required. The logic is as follows:
1. Use the wmi module to obtain the stopped services in the system and generate a dictionary.
2. Determine whether the monitored service exists in the dictionary. if the service is stopped, the service will be started and an alert email will be sent.
3. Send a connect to the local Socket service port. If exceptions are caught, restart the service and send an alert email.
4. The script executes the preceding three steps cyclically at intervals of 10 seconds to ensure normal service status.
A problem was found during running. Python uses the wmi module to perform operations on Windows systems at a particularly slow speed. I don't know if there are other alternative methods, which of the following can give you some advice if there is a better way.
The source code is as follows:
#!/usr/bin/env pythonimport osimport wmiimport timeimport socketimport base64import smtplibimport loggingfrom email.mime.text import MIMETextdef GetSrv(designation): """Get stopped service name and caption, Filtration 'designation' service whether there is 'Stopped'. :return: service state """ c = wmi.WMI() ret = dict() for service in c.Win32_Service(): state, caption = service.State, service.Caption if state == 'Stopped': t = ret.get(state, []) t.append(caption) ret[state] = t # If 'designation' service in the 'Stopped', return status is 'down' if designation in ret.get('Stopped'): logging.error('Service [%s] is down, try to restart the service. \r\n' % designation) return 'down' return Truedef Monitor(sname): """Send the machine IP port 20000 socket request, If capture the abnormal returns the string 'ex'. :return: string 'ex' """ s = socket.socket() s.settimeout(3) # timeout host = ('127.0.0.1', 20000) try: # Try connection to the host s.connect(host) except socket.error as e: logging.warning('[%s] service connection failed: %s \r\n' % (sname, e)) return 'ex' return Truedef RestartSocket(rstname, conn, run): """First check whether the service is stopped, if stop, start the service directly. The check whether the zombies, if a zombie, then restart the service. :return: flag or True """ flag = False try: # From GetSrv() to obtain the return value, the return value if run == 'down': ret = os.system('sc start "%s"' % rstname) if ret != 0: raise Exception('[Errno %s]' % ret) flag = True elif conn == 'ex': retStop = os.system('sc stop "%s"' % rstname) retSart = os.system('sc start "%s"' % rstname) if retSart != 0: raise Exception('retStop [Status code %s] ' 'retSart [Status code %s] ' % (retStop, retSart)) flag = True else: logging.info('[%s] service running status to normal' % rstname) return True except Exception as e: logging.warning('[%s] service restart failed: %s \r\n' % (rstname, e)) return flagdef SendMail(to_list, sub, contents): """Send alarm mail. :return: flag """ mail_server = 'mail.stmp.com' # STMP Server mail_user = 'YouAccount' # Mail account mail_pass = base64.b64decode('Password') # The encrypted password mail_postfix = 'smtp.com' # Domain name me = 'Monitor alarm<%s@%s>' % (mail_user, mail_postfix) message = MIMEText(contents, _subtype='html', _charset='utf-8') message['Subject'] = sub message['From'] = me message['To'] = ';'.join(to_list) flag = False # To determine whether a mail sent successfully try: s = smtplib.SMTP() s.connect(mail_server) s.login(mail_user, mail_pass) s.sendmail(me, to_list, message.as_string()) s.close() flag = True except Exception, e: logging.warning('Send mail failed, exception: [%s]. \r\n' % e) return flagdef main(sname): """Parameter type in the name of the service need to monitor, perform functions defined in turn, and the return value is correct. After the program is running, will test three times, each time interval to 10 seconds. :return: retValue """ retry = 3 count = 0 retValue = False # Used return to the state of the socket while count < retry: ret = Monitor(sname) if ret != 'ex': # If socket connection is normaol, return retValue retValue = ret return retValue isDown = GetSrv(sname) RestartSocket(rstname=sname, conn=ret, run=isDown) host = socket.gethostname() address = socket.gethostbyname(host) mailto_list = ['mail@smtp.com', ] # Alarm contacts SendMail(mailto_list, 'Alarm', '