It is inevitable that the test link will be released to the production environment. Generally, this is to avoid risks through some testing check tools to check the link, below we will briefly describe the general implementation ideas. Many Internet companies may encounter tests, prerelease, and online environments, to isolate the test from the formal online environment. In this case, the test link is released to the production environment, generally, some testing tools are used to check links to avoid risks. I encountered a problem two days ago. Due to development negligence, I published the daily url online. However, there was no automated monitoring tool in the test, which resulted in no timely detection. Because I was just reading python recently, I wanted to use python for simple monitoring after I finished the process.
The general idea is: Use python to write a script to analyze all URLs on the Web page to see if it contains daily links. Then, place the script in the crontab to run the scheduled task and run the check once every 10 minutes. If an illegal link is found, an alert is sent to the relevant personnel. There are about 100 lines of script code. It is easy to understand and paste the code.
I originally wanted to use beautifulsoup, but considering the trouble of installing the third-party library, I still use the built-in sgmllib, so I don't need to care about the library. The mail function is not implemented. You can implement the following based on your smtp server.
The Code is as follows:
#! /Usr/bin/env python
# Coding: UTF-8
Import urllib2
From sgmllib import SGMLParser
Import smtplib
Import time
# From email. mime. text import MIMEText
# From bs4 import BeautifulSoup
# Import re
Class UrlParser (SGMLParser ):
Urls = []
Def do_a (self, attrs ):
'''''Parse tag '''
For name, value in attrs:
If name = 'href ':
Self. urls. append (value)
Else:
Continue
Def do_link (self, attrs ):
'''''Parse tag link '''
For name, value in attrs:
If name = 'href ':
Self. urls. append (value );
Else:
Continue
Def checkUrl (checkurl, isDetail ):
''''' Check whether the webpage source code corresponding to the checkurl has an invalid url '''
Parser = UrlParser ()
Page = urllib2.urlopen (checkurl)
Content = page. read ()
# Content = unicode (content, "gb2312"). encode ("utf8 ")
Parser. feed (content)
Urls = parser. urls
DailyUrls = []
DetailUrl = ""
For url in urls:
If 'daily 'in url:
DailyUrls. append (url );
If not detailUrl and not isDetail and 'www .bc5u.com 'in url:
DetailUrl = url
Page. close ()
Parser. close ()
If isDetail:
Return dailyUrls
Else:
Return dailyUrls, detailUrl
Def sendMail ():
''''' Send reminder email '''
Pass
Def log (content ):
''''' Record execution log '''
LogFile = 'checkdailyurl. Log'
F = open (logFile, 'A ')
F. write (str (time. strftime ("% Y-% m-% d % X", time. localtime () + content + '\ n ')
F. flush ()
F. close ()
Def main ():
''''' Entry method '''
# Check ju
Url = "www.bc5u.com"
DailyUrls, detailUrl = checkUrl (url, False)
If dailyUrls:
# Check the daily link and send an alert email
SendMail ()
Log ('check: find daily url ')
Else:
# Do not check the daily link.
Log ('check: not find daily url ')
# Check judetail
DailyUrls = checkUrl (detailUrl, True)
If dailyUrls:
# Check the daily link and send an alert email
Log ('check: find daily url ')
SendMail ()
Else:
# Do not check the daily link.
Log ('check: not find daily url ')
If _ name _ = '_ main __':
Main ()