HTTPwatch official: http://www.httpwatch.com/rubywatir/
Ruby on httpwatch example: http://www.httpwatch.com/rubywatir/site_spider.zip (this example may be updated on the official website)
After this example is obtained, some Chinese comments are made to delete some codes. The main changes are as follows:
1. In url = gets. chomp! Add ($ * [0]. nil?) above ?)? (Url = url) :( url = $ * [0]). Currently, the URL can be loaded in the command line or fixed in the script. Command Line usage: ruby Script Name: website name, for specific usage instructions, see the notes in the script. Do not add http: // before the URL ://
2. Check out two breaks. There is no problem with ruby186 and there will be errors in later versions like ruby192.
3. Check plugin. Container. Quit (); that is, do not exit IE. After running, the tester needs to check the result.
Running problem: If the testing machine's low network speed may cause timeout and exit
C:/Ruby192/lib/ruby/gems/1.9.1/gems/watir-classic-3.0.0/lib/watir-classic/ie-class.rb: 374: in 'method _ missing ': (in OLE method 'navigate':) (WIN32OLERuntimeError) OLE error code: 800C000E in <Unknown> <No Description> HRESULT error code: 0x80020009. From C:/Ruby192/lib/ruby/gems/1.9.1/gems/watir-classic-3.0.0/lib/watir-classic/ie-class.rb: 374: in 'Goto 'from C: /Documents and Settings/Administrator/desktop/site_spider/site_spider.rb: 55: in '<main>'
Site_spider.rb
1 # A Site Spider that use HttpWatch, Ruby And Watir 2 #3 # For more information about this example please refer to http://www.httpwatch.com/rubywatir/ 4 #5 MAX_NO_PAGES = 200 # How many pages are accessed at A time, MAX_ON_PAGES controls 6 7 require 'win32ole' # win32ole to drive the HttpWatch tool. For versions earlier than HttpWatch6.0, 8 require 'rubygems '9 require 'watir '10 require 'cannot be called '. /url_ops.rb '# url_ops.rb should be placed in the same directory of the script. 11 url = "www.gaopeng.com /? ADTAG = beijing_from_beijing "# URL to be tested. You can also do not add http: // 12 13 # Create HttpWatch 14 control = WIN32OLE before reading from the command line. new ('httpwatch. controller ') 15 httpWatchVer = control. version 16 if httpWatchVer [0... 1] = "4" or httpWatchVer [0... 1] = "5" 17 puts "\ nERROR: You are running HttpWatch # {httpWatchVer }. this sample requires HttpWatch 6.0 or later. press Enter to exit... "; $ stdout. flush 18 gets 19 # break # ruby1 There is no problem with version 86. It may be wrong in a later version like ruby192, you need to watch out for 20 end 21 22 # Get the domain name to spider 23 puts "Enter the domain name of the site to check (press enter for url): \ n"; $ stdout. flush 24 ($ * [0]. nil ?)? (Url = url) :( url = $ * [0]) # upload the file name from the command line and read the 25 # url = gets. chomp! Of the command line first! # If you add the code of the previous line, you must check this line. 26 if url. empty? 27 url = url 28 end 29 hostName = url. HostName 30 if hostName. empty? 31 puts "\ nPlease enter a valid domain name. press Enter to exit... "; $ stdout. flush 32 gets 33 # break # ruby186 is correct. There may be errors in later versions like ruby192. You need to watch 34 end 35 36 # start IE 37 ie = Watir: IE. new 38 ie. logger. level = Logger: ERROR 39 40 # locate IE window 41 plugin = control. ie. attach (ie. ie) 42 43 # Start to record HTTP traffic 44 plugin. clear () 45 plugin. log. enableFilter (false) 46 plugin. record () 47 48 49 url = url. canonicalUrl 50 urls Visited = Array. new; urlsToVisit = Array. new (1, url) 51 # access page 52 53 while urlsToVisit. length> 0 & urlsVisited. length <MAX_NO_PAGES 54 55 nextUrl = urlsToVisit. pop 56 puts "Loading" + nextUrl + "... "; $ stdout. flush 57 58 ie. goto (nextUrl) # get WATIR to load URL 59 urlsVisited. push (nextUrl) # store this URL in the list that has been visited 60 61 begin 62 # Look at each link on the page and Decide if it needs to be visited 63 ie. links (). each () do | link | 64 65 linkUrl = link. href. canonicalUrl 66 # if the url has already been accessed or if it is a download or if it from a different domain 67 if! Url. IsSubDomain (linkUrl. HostName) | 68 linkUrl. Path. include? (". Exe") | linkUrl. Path. include? (". Zip") | linkUrl. Path. include? (". Csv") | 69 linkUrl. Path. include? (". Pdf") | linkUrl. Path. include? (". Png") | 70 urlsToVisit. find {| aUrl = linkUrl }! = Nil | 71 urlsVisited. find {| aUrl = linkUrl }! = Nil 72 # Don't add this URL to the list 73 next 74 end 75 # Add this URL to the list 76 urlsToVisit. push (linkUrl) 77 end 78 rescue 79 puts "Failed to find links in" + nextUrl + "" + $ !; $ Stdout. flush 80 end 81 82 end 83 84 if (urlsVisited. length = MAX_NO_PAGES) 85 puts "\ nThe spider has stopped because # {MAX_NO_PAGES} pages have been visited. (Change MAX_NO_PAGES if you want to increase this limit) "; $ stdout. flush 86 end 87 88 # Stop Recording HTTP data in HttpWatch 89 plugin. stop () 90 91 puts "\ nAnalyzing HTTP data .. "; $ stdout. flush 92 93 94 # Look at each HTTP request in Log to compile list of URLs 95 # for each error 96 errorUrls = Hash. new 97 plugin. Log. Entries. each do | entry | 98 if! Entry. Error. empty? & Entry. Error! = "Aborted" | entry. StatusCode >=400 99 if! ErrorUrls. has_key? (Entry. results) 100 errorUrls [entry. result] = Array. new (1, entry. url) 101 else102 if errorUrls [entry. result]. find {| aUrl = entry. url }== nil 103 errorUrls [entry. result]. push (entry. url) 104 end 105 end106 end107 end108 109 # Display summary statistics for whole log110 summary = plugin. log. entries. summary111 112 printf "Total time to load page (secs): %. 3f \ n ", summary. time113 printf "Number of bytes attached ed on network: % d \ n", summary. bytesReceived114 115 printf "HTTP compression saving (bytes): % d \ n", summary. compressionSavedBytes116 printf "Number of round trips: % d \ n", summary. roundTrips117 printf "Number of errors: % d \ n", summary. errors. count118 119 # Print out errors120 summary. errors. each do | error | 121 numErrors = error. occurrences122 description = error. description123 puts "# {numErrors} URL (s) caused a # {description} error:" 124 errorUrls [error. result]. each do | aUrl | 125 puts "-> # {aUrl}" 126 end127 128 end129 130 # exit IE. comment it out here. After running, the tester needs to check the result 131 # plugin. container. quit (); 132 133 puts "\ r \ nPress Enter to exit"; $ stdout. flush134 # gets
Url_ops.rb
1 # Helper functions used to parse URLs 2 class String 3 def HostName 4 matches = scan(/^(?:https?:\/\/)?([^\/]*)/) 5 if matches.length > 0 && matches[0].length > 0 6 return matches[0][0].downcase 7 else 8 return "" 9 end10 end11 def IsSubDomain( hostName)12 thisHostName = self.HostName13 if thisHostName.slice(0..3) == "www."14 thisHostName = thisHostName.slice(4..-1)15 end16 if thisHostName == hostName ||17 (hostName.length > thisHostName.length &&18 hostName.slice( -thisHostName.length ..-1) == thisHostName)19 return true20 end21 return false22 end23 def Protocol24 matches = scan(/^(https?:\/\/)/)25 if matches.length > 0 && matches[0].length > 026 return matches[0][0].downcase27 else28 return "http://"29 end30 end 31 def Path32 if scan(/^(https?:\/\/)/).length > 0 33 matches = scan(/^https?:\/\/[^\/]+\/([^#]+)$/)34 else35 matches = scan(/^[^\/]+\/([^#]+)$/)36 end 37 if matches != nil && matches.length == 1 && matches[0].length == 138 return matches[0][0].downcase39 else40 return ""41 end42 end 43 def CanonicalUrl44 return self.Protocol + self.HostName + "/" + self.Path45 end 46 end
The two scripts are placed in the same directory, and url_ops.rb is not changed. Just execute it in cmd.