HTTPwatch官方:http://www.httpwatch.com/rubywatir/
ruby on httpwatch例子:http://www.httpwatch.com/rubywatir/site_spider.zip (這個例子官網可能更新)
得到這個例子後做了一些中文注釋,對一些代碼進行了刪減,主要修改內容如下:
1、在url = gets.chomp!上面添加($*[0].nil?)?(url = url):(url = $*[0]),目前URL可以在命令列載入,也可以在指令碼中固定;命令列方式用法:ruby 指令碼名 網站名,具體的用法請參看指令碼中的注釋,說明一下 在URL前面不要添加http://
2、注視掉了兩個break,在ruby186版本沒有問題,在ruby192這樣的高版本上會有錯,需要注視掉
3、注視掉 plugin.Container.Quit(); 即不退出IE,運行完畢後,測試人員需要去查看結果
運行時問題:如果測試機網速較低可能出現逾時而退出
C:/Ruby192/lib/ruby/gems/1.9.1/gems/watir-classic-3.0.0/lib/watir-classic/ie-class.rb:374:in `method_missing': (in OLE method `navigate': ) (WIN32OLERuntimeError) OLE error code:800C000E in <Unknown> <No Description> HRESULT error code:0x80020009 發生意外。 from C:/Ruby192/lib/ruby/gems/1.9.1/gems/watir-classic-3.0.0/lib/watir-classic/ie-class.rb:374:in `goto' from C:/Documents and Settings/Administrator/案頭/site_spider/site_spider.rb:55:in `<main>'
site_spider.rb
1 # A Site Spider that use HttpWatch, Ruby And Watir 2 # 3 # For more information about this example please refer to http://www.httpwatch.com/rubywatir/ 4 # 5 MAX_NO_PAGES = 200 #一次訪問多少個頁面,由MAX_ON_PAGES控制 6 7 require 'win32ole' # win32ole來驅動HttpWatch工具,HttpWatch6.0以下版本不能調用 8 require 'rubygems' 9 require 'watir' 10 require './url_ops.rb' # url_ops.rb要放在該指令碼的同一目錄下 11 url = "www.gaopeng.com/?ADTAG=beijing_from_beijing" #要測試的URL,也可以在命令列讀取前面不要添加http:// 12 13 # Create HttpWatch 14 control = WIN32OLE.new('HttpWatch.Controller') 15 httpWatchVer = control.Version 16 if httpWatchVer[0...1] == "4" or httpWatchVer[0...1] == "5" 17 puts "\nERROR: You are running HttpWatch #{httpWatchVer}. This sample requires HttpWatch 6.0 or later. Press Enter to exit..."; $stdout.flush 18 gets 19 #break #ruby186版本沒有問題,在ruby192這樣的高版本上會有錯,需要注視掉 20 end 21 22 # Get the domain name to spider 23 puts "Enter the domain name of the site to check (press enter for url):\n"; $stdout.flush 24 ($*[0].nil?)?(url = url):(url = $*[0]) #從命令列傳檔案名稱過去,優先讀取命令列的 25 #url = gets.chomp! #如果添加上面一行的代碼,必須注視這一行 26 if url.empty? 27 url = url 28 end 29 hostName =url.HostName 30 if hostName.empty? 31 puts "\nPlease enter a valid domain name. Press Enter to exit..."; $stdout.flush 32 gets 33 #break #ruby186版本沒有問題,在ruby192這樣的高版本上會有錯,需要注視掉 34 end 35 36 # 啟動IE 37 ie = Watir::IE.new 38 ie.logger.level = Logger::ERROR 39 40 # 定位IE視窗 41 plugin = control.ie.Attach(ie.ie) 42 43 # 開始記錄HTTP流量 44 plugin.Clear() 45 plugin.Log.EnableFilter(false) 46 plugin.Record() 47 48 49 url = url.CanonicalUrl 50 urlsVisited = Array.new; urlsToVisit = Array.new( 1, url ) 51 # 開始訪問頁面 52 53 while urlsToVisit.length > 0 && urlsVisited.length < MAX_NO_PAGES 54 55 nextUrl= urlsToVisit.pop 56 puts "Loading " + nextUrl + "..."; $stdout.flush 57 58 ie.goto(nextUrl) # get WATIR to load URL 59 urlsVisited.push( nextUrl) # store this URL in the list that has been visited 60 61 begin 62 # Look at each link on the page and decide if it needs to be visited 63 ie.links().each() do |link| 64 65 linkUrl = link.href.CanonicalUrl 66 # if the url has already been accessed or if it is a download or if it from a different domain 67 if !url.IsSubDomain( linkUrl.HostName ) || 68 linkUrl.Path.include?( ".exe" ) || linkUrl.Path.include?(".zip") || linkUrl.Path.include?(".csv") || 69 linkUrl.Path.include?( ".pdf" ) || linkUrl.Path.include?( ".png" ) || 70 urlsToVisit.find{ |aUrl| aUrl == linkUrl} != nil || 71 urlsVisited.find{ |aUrl| aUrl == linkUrl} != nil 72 # Don't add this URL to the list 73 next 74 end 75 # Add this URL to the list 76 urlsToVisit.push(linkUrl) 77 end 78 rescue 79 puts "Failed to find links in " + nextUrl + " " + $!; $stdout.flush 80 end 81 82 end 83 84 if ( urlsVisited.length == MAX_NO_PAGES ) 85 puts "\nThe spider has stopped because #{MAX_NO_PAGES} pages have been visited. (Change MAX_NO_PAGES if you want to increase this limit)"; $stdout.flush 86 end 87 88 # Stop Recording HTTP data in HttpWatch 89 plugin.Stop() 90 91 puts "\nAnalyzing HTTP data.."; $stdout.flush 92 93 94 # Look at each HTTP request in the log to compile list of URLs 95 # for each error 96 errorUrls = Hash.new 97 plugin.Log.Entries.each do |entry| 98 if !entry.Error.empty? && entry.Error != "Aborted" || entry.StatusCode >= 400 99 if !errorUrls.has_key?(entry.Result )100 errorUrls[entry.Result] = Array.new( 1, entry.Url ) 101 else102 if errorUrls[entry.Result].find{ |aUrl| aUrl == entry.Url } == nil 103 errorUrls[entry.Result].push( entry.Url )104 end 105 end106 end107 end108 109 # Display summary statistics for whole log110 summary = plugin.Log.Entries.Summary111 112 printf "Total time to load page (secs): %.3f\n", summary.Time113 printf "Number of bytes received on network: %d\n", summary.BytesReceived114 115 printf "HTTP compression saving (bytes): %d\n", summary.CompressionSavedBytes116 printf "Number of round trips: %d\n", summary.RoundTrips117 printf "Number of errors: %d\n", summary.Errors.Count118 119 # Print out errors120 summary.Errors.each do |error|121 numErrors = error.Occurrences122 description = error.Description123 puts "#{numErrors} URL(s) caused a #{description} error:"124 errorUrls[error.Result].each do |aUrl|125 puts "-> #{aUrl}"126 end127 128 end129 130 # 退出IE,這裡注釋掉,在運行完畢後,測試人員需要去查看結果131 #plugin.Container.Quit();132 133 puts "\r\nPress Enter to exit"; $stdout.flush134 #gets
url_ops.rb
1 # Helper functions used to parse URLs 2 class String 3 def HostName 4 matches = scan(/^(?:https?:\/\/)?([^\/]*)/) 5 if matches.length > 0 && matches[0].length > 0 6 return matches[0][0].downcase 7 else 8 return "" 9 end10 end11 def IsSubDomain( hostName)12 thisHostName = self.HostName13 if thisHostName.slice(0..3) == "www."14 thisHostName = thisHostName.slice(4..-1)15 end16 if thisHostName == hostName ||17 (hostName.length > thisHostName.length &&18 hostName.slice( -thisHostName.length ..-1) == thisHostName)19 return true20 end21 return false22 end23 def Protocol24 matches = scan(/^(https?:\/\/)/)25 if matches.length > 0 && matches[0].length > 026 return matches[0][0].downcase27 else28 return "http://"29 end30 end 31 def Path32 if scan(/^(https?:\/\/)/).length > 0 33 matches = scan(/^https?:\/\/[^\/]+\/([^#]+)$/)34 else35 matches = scan(/^[^\/]+\/([^#]+)$/)36 end 37 if matches != nil && matches.length == 1 && matches[0].length == 138 return matches[0][0].downcase39 else40 return ""41 end42 end 43 def CanonicalUrl44 return self.Protocol + self.HostName + "/" + self.Path45 end 46 end
兩個指令碼放在同一目錄下,url_ops.rb未作變動,在cmd中執行即可。