In big internet companies, the basics of dry technology will run into testing, pre-release, online This many sets of environment, to achieve testing and online formal environment isolation, in this case, will inevitably encounter show tease the test link published to the online situation, generally this is through some testing tools to check the link to avoid risk. Two days ago with a problem is also this situation, development negligence of the daily URL posted to the online. But the test has no automated monitoring tools, resulting in the lack of timely detection, due to the recent right to see Python, and then go home after processing to use Python to do a simple monitoring.
The general idea is: Use Python to write a script to analyze all the URLs in the Web page to see if there is a daily link, and then put the script in the crontab to run a timed task, 10 minutes to run a check. If an illegal link is found, send an alert email to the person concerned. Script code 100 lines around, better understand, paste code.
Originally want to use BeautifulSoup, but consider the installation of the three-party library trouble, so or with the sgmllib come, do not need to care about the library. The outgoing mail function is not implemented, depending on the respective SMTP server implementation below.
Copy Code code as follows:
#!/usr/bin/env python
#coding: UTF-8
Import Urllib2
From Sgmllib import Sgmlparser
Import Smtplib
Import time
#from Email.mime.text Import Mimetext
#from BS4 Import BeautifulSoup
#import RE
Class Urlparser (Sgmlparser):
URLs = []
def do_a (self,attrs):
"' Parse tag a '"
For Name,value in Attrs:
If name== ' href ':
Self.urls.append (value)
Else
Continue
def do_link (self,attrs):
"' Parse tag link '
For Name,value in Attrs:
If name== ' href ':
Self.urls.append (value);
Else
Continue
def checkurl (Checkurl, Isdetail):
"" Check checkurl corresponding page source code has illegal url ""
Parser = Urlparser ()
page = Urllib2.urlopen (Checkurl)
Content = Page.read ()
#content = Unicode (content, "gb2312"). Encode ("UTF8")
Parser.feed (content)
URLs = Parser.urls
Dailyurls = []
Detailurl = ""
For URL in URLs:
If ' in ' URL:
Dailyurls.append (URL);
If not detailurl and is isdetail and ' www.bc5u.com ' in URL:
Detailurl = URL
Page.close ()
Parser.close ()
If Isdetail:
Return Dailyurls
Else
Return Dailyurls,detailurl
Def sendMail ():
' Send a reminder message '
Pass
def log (content):
"' Record execution log '
LogFile = ' Checkdailyurl.log '
f = open (LogFile, ' a ')
F.write (Str (time.strftime ("%y-%m-%d%x", Time.localtime ()) +content+ ' \ n ')
F.flush ()
F.close ()
def main ():
"" "The Method of entry"
#检查ju
url = "Www.bc5u.com"
Dailyurls,detailurl=checkurl (URL, False)
If Dailyurls:
#检查到daily链接, send alert messages
SendMail ()
Log (' Check:find daily URL ')
Else
#没检查到daily链接, do not handle
Log (' check:not find daily URL ')
#检查judetail
Dailyurls=checkurl (Detailurl, True)
If Dailyurls:
#检查到daily链接, send alert messages
Log (' Check:find daily URL ')
SendMail ()
Else
#没检查到daily链接, do not handle
Log (' check:not find daily URL ')
if __name__ = = ' __main__ ':
Main ()