In the Web Access quality monitoring, it is necessary to use the HttpWatch tool. HttpWatch can record all the details that take place during a Web page, including all the elements in the Web page, from the DNSLookup, the network connection to the first packet sending time, and so on (as shown), all with detailed records, providing a visual way for us to find the problem. Generally we are in the case of problems, we use it to analyze. But if it is used to keep track of the access of a Web page, and then record the storage, the data can provide a basic data for the analysis of the problem, which is also very meaningful. So httpwatch can achieve this demand. The answer is yes, it's easy to implement with Python. The following code uses Python to automatically read the page to be monitored from an external file, and to print some time features, of course, you can also achieve a more powerful function
External file format:
http://www.cites.com/
Http://www.cites2.com/page1.html
Http://www.cites3.com/page2.html
HttpWatch Default support C # with Ruby,python if you want to call it, need to use win32com This module, this need to install PYWIN32, can be downloaded to this address
http://sourceforge.net/projects/pywin32/files/pywin32/
here is the program implementation code:
#coding =utf-8
Import Win32com.client
# # #定义一个函数, which reads the external file to get the URL to be checked and returns it as a list
def getcitetocheck (filepath):
Input = open (filepath, ' R ')
cites = Input.readlines ()
Return to cites
def checkcite (CITES):
#创建一个HttpWatch实例, and open an IE process
Control = Win32com.client.Dispatch (' Httpwatch.controller ')
Plugin = control. Ie. New ()
Plugin. Log.enablefilter (False) #httpwatch的可以设置过滤某些条目, this is set to not filter
Plugin. Record () #激活httpwatch记录
I=1
For domain in cites:
url = domain.strip (' \ n ') #因为从文件里读的地址会带有换行符 \ n, so it needs to be removed, but it can be opened without removing it during testing.
Plugin. Gotourl (URL)
Control. Wait (plugin,-1)
#可以将日志记录到一个xml文件里去
Logfilename= ' D:\\log ' +str (i) + '. Xml '
Plugin. Log.exportxml (LogFileName)
#也可以直接读log的内容
Print (plugin. Log.Entries.Count)
for s in plugin. Log.entries: #plugin. Log.entries is a list, a list element is an object that corresponds to all the URL elements contained in a page
Print (S.url)
Print (S.time)
#s. timings.blocked returns a Timing object with three properties timing: Duration, Started, Valid, respectively
#Duration是指下载一个RUL元素所耗时间, started refers to the start time
#Timings含有Blocked, Cacheread, Connect, DNSLookup, Network, Receice, Send, TTFB, wait several objects
Print (' Blocked: ' +str (s.timings.blocked.duration))
Print (' Cacheread: ' +str (s.timings.cacheread.duration))
Print (' Connect: ' +str (s.timings.connect.duration))
Print (' DNSLookup: ' +str (s.timings.dnslookup.duration))
Print (' Network: ' +str (s.timings.network.duration))
Print (' Receive: ' +str (s.timings.receive.duration))
Print (' Send: ' +str (s.timings.send.duration))
Print (' TTFB: ' +str (s.timings.ttfb.duration))
Print (' Wait: ' +str (s.timings.wait.duration))
I=i+1
Plugin. Stop ()
Plugin. CloseBrowser ()
###########
cite_file= "Cite.txt"
cites = Getcitetocheck (cite_file)
########
Print (CITES)
For i in [1,2,3,4]:
Checkcite (CITES)
Automatically monitor Web pages with Python and HttpWatch