ruby on Httpwatch 指令碼

最後更新：2018-12-05 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

HTTPwatch官方：http://www.httpwatch.com/rubywatir/

ruby on httpwatch例子：http://www.httpwatch.com/rubywatir/site_spider.zip （這個例子官網可能更新）

得到這個例子後做了一些中文注釋，對一些代碼進行了刪減，主要修改內容如下：

1、在url = gets.chomp!上面添加($*[0].nil?)?(url = url):(url = $*[0])，目前URL可以在命令列載入，也可以在指令碼中固定；命令列方式用法：ruby 指令碼名網站名，具體的用法請參看指令碼中的注釋，說明一下在URL前面不要添加http://

2、注視掉了兩個break，在ruby186版本沒有問題，在ruby192這樣的高版本上會有錯，需要注視掉

3、注視掉 plugin.Container.Quit(); 即不退出IE，運行完畢後，測試人員需要去查看結果

運行時問題：如果測試機網速較低可能出現逾時而退出

C:/Ruby192/lib/ruby/gems/1.9.1/gems/watir-classic-3.0.0/lib/watir-classic/ie-class.rb:374:in `method_missing': (in OLE method `navigate': ) (WIN32OLERuntimeError)    OLE error code:800C000E in <Unknown>      <No Description>    HRESULT error code:0x80020009      發生意外。        from C:/Ruby192/lib/ruby/gems/1.9.1/gems/watir-classic-3.0.0/lib/watir-classic/ie-class.rb:374:in `goto'        from C:/Documents and Settings/Administrator/案頭/site_spider/site_spider.rb:55:in `<main>'

site_spider.rb

  1 # A Site Spider that use HttpWatch, Ruby And Watir  2 #   3 # For more information about this example please refer to http://www.httpwatch.com/rubywatir/  4 #  5 MAX_NO_PAGES = 200    #一次訪問多少個頁面，由MAX_ON_PAGES控制  6   7 require 'win32ole'        # win32ole來驅動HttpWatch工具，HttpWatch6.0以下版本不能調用  8 require 'rubygems'  9 require 'watir' 10 require './url_ops.rb'    # url_ops.rb要放在該指令碼的同一目錄下 11 url = "www.gaopeng.com/?ADTAG=beijing_from_beijing"        #要測試的URL，也可以在命令列讀取前面不要添加http:// 12  13 # Create HttpWatch 14 control = WIN32OLE.new('HttpWatch.Controller') 15 httpWatchVer = control.Version 16 if httpWatchVer[0...1] == "4" or httpWatchVer[0...1] == "5" 17     puts "\nERROR: You are running HttpWatch #{httpWatchVer}. This sample requires HttpWatch 6.0 or later. Press Enter to exit...";  $stdout.flush 18     gets 19     #break        #ruby186版本沒有問題，在ruby192這樣的高版本上會有錯，需要注視掉 20 end 21  22 # Get the domain name to spider 23 puts "Enter the domain name of the site to check (press enter for url):\n";  $stdout.flush 24 ($*[0].nil?)?(url = url):(url = $*[0])  #從命令列傳檔案名稱過去,優先讀取命令列的 25 #url = gets.chomp!   #如果添加上面一行的代碼，必須注視這一行 26 if  url.empty?  27     url = url 28 end 29 hostName =url.HostName 30 if  hostName.empty?  31     puts "\nPlease enter a valid domain name. Press Enter to exit...";  $stdout.flush 32     gets 33     #break        #ruby186版本沒有問題，在ruby192這樣的高版本上會有錯，需要注視掉 34 end 35  36 # 啟動IE 37 ie = Watir::IE.new 38 ie.logger.level = Logger::ERROR 39  40 # 定位IE視窗 41 plugin = control.ie.Attach(ie.ie) 42  43 # 開始記錄HTTP流量 44 plugin.Clear() 45 plugin.Log.EnableFilter(false) 46 plugin.Record() 47  48  49 url = url.CanonicalUrl 50 urlsVisited = Array.new;  urlsToVisit = Array.new( 1, url ) 51 # 開始訪問頁面 52  53 while urlsToVisit.length > 0 && urlsVisited.length < MAX_NO_PAGES 54  55     nextUrl= urlsToVisit.pop 56     puts "Loading " + nextUrl + "...";   $stdout.flush 57      58     ie.goto(nextUrl)            # get WATIR to load URL 59     urlsVisited.push( nextUrl)    # store this URL in the list that has been visited 60    61   begin 62     # Look at each link on the page and decide if it needs to be visited 63     ie.links().each() do |link| 64          65         linkUrl = link.href.CanonicalUrl 66         # if the url has already been accessed or if it is a download or if it from a different domain 67         if !url.IsSubDomain( linkUrl.HostName ) || 68            linkUrl.Path.include?( ".exe" ) || linkUrl.Path.include?(".zip") || linkUrl.Path.include?(".csv") ||  69            linkUrl.Path.include?( ".pdf" ) || linkUrl.Path.include?( ".png" ) || 70            urlsToVisit.find{ |aUrl| aUrl == linkUrl}  != nil || 71            urlsVisited.find{ |aUrl| aUrl == linkUrl}  != nil 72           # Don't add this URL to the list 73           next 74         end 75         # Add this URL to the list 76         urlsToVisit.push(linkUrl) 77       end 78   rescue 79     puts "Failed to find links in " + nextUrl + " " + $!;  $stdout.flush 80   end 81      82 end 83  84 if ( urlsVisited.length == MAX_NO_PAGES ) 85     puts "\nThe spider has stopped because #{MAX_NO_PAGES} pages have been visited. (Change MAX_NO_PAGES if you want to increase this limit)";   $stdout.flush 86 end 87  88 # Stop Recording HTTP data in HttpWatch 89 plugin.Stop() 90  91 puts "\nAnalyzing HTTP data..";   $stdout.flush 92  93  94 # Look at each HTTP request in the log to compile list of URLs 95 # for each error 96 errorUrls = Hash.new 97 plugin.Log.Entries.each do |entry| 98     if  !entry.Error.empty? && entry.Error != "Aborted" || entry.StatusCode >= 400 99         if !errorUrls.has_key?(entry.Result )100             errorUrls[entry.Result] =  Array.new( 1, entry.Url  ) 101         else102             if errorUrls[entry.Result].find{ |aUrl| aUrl == entry.Url } == nil 103                 errorUrls[entry.Result].push( entry.Url  )104             end             105         end106     end107 end108 109 # Display summary statistics for whole log110 summary = plugin.Log.Entries.Summary111 112 printf "Total time to load page (secs):      %.3f\n", summary.Time113 printf "Number of bytes received on network: %d\n", summary.BytesReceived114 115 printf "HTTP compression saving (bytes):     %d\n", summary.CompressionSavedBytes116 printf "Number of round trips:               %d\n",  summary.RoundTrips117 printf "Number of errors:                    %d\n", summary.Errors.Count118 119 # Print out errors120 summary.Errors.each do |error|121     numErrors = error.Occurrences122     description = error.Description123     puts "#{numErrors} URL(s) caused a #{description} error:"124     errorUrls[error.Result].each do |aUrl|125         puts "-> #{aUrl}"126     end127 128 end129 130 # 退出IE，這裡注釋掉，在運行完畢後，測試人員需要去查看結果131 #plugin.Container.Quit();132 133 puts "\r\nPress Enter to exit";  $stdout.flush134 #gets

url_ops.rb

 1 # Helper functions used to parse URLs 2 class String 3   def HostName 4       matches = scan(/^(?:https?:\/\/)?([^\/]*)/) 5       if matches.length > 0 && matches[0].length > 0 6          return matches[0][0].downcase 7       else 8           return "" 9       end10   end11   def IsSubDomain( hostName)12     thisHostName = self.HostName13     if thisHostName.slice(0..3) == "www."14         thisHostName = thisHostName.slice(4..-1)15     end16     if thisHostName == hostName ||17       (hostName.length > thisHostName.length &&18        hostName.slice( -thisHostName.length ..-1) == thisHostName)19         return true20     end21     return false22   end23   def Protocol24       matches = scan(/^(https?:\/\/)/)25       if matches.length > 0 && matches[0].length > 026           return matches[0][0].downcase27       else28           return "http://"29       end30   end  31   def Path32       if scan(/^(https?:\/\/)/).length > 0 33         matches = scan(/^https?:\/\/[^\/]+\/([^#]+)$/)34       else35         matches = scan(/^[^\/]+\/([^#]+)$/)36           end        37       if matches != nil && matches.length == 1 && matches[0].length == 138           return matches[0][0].downcase39       else40           return ""41       end42   end   43   def CanonicalUrl44       return self.Protocol + self.HostName + "/" + self.Path45   end   46 end

兩個指令碼放在同一目錄下，url_ops.rb未作變動，在cmd中執行即可。

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More