PHP 判斷網址是否正確 / 網頁是否存在____PHP

來源:互聯網
上載者:User

PHP 要判斷網頁是否存在, 簡單的方法就是 fopen / file_get_contents .. 等等, 有一堆的方式可以做, 不過這些方式都會把整頁 HTML 拉回來, 要判斷的網址資料很多時, 就會有點慢.

要判斷可以由 HTTP HEADER 來判斷, 就不用把整頁的內容都抓回來(詳可見: Hypertext Transfer Protocol -- HTTP/1.1). fsockopen 判斷 HTTP Header

簡單的範例如下(轉載自: PHP Server Side Scripting - Checking if page exists)

<?phpif ($sock = fsockopen('something.net', 80)){   fputs($sock, "HEAD /something.html HTTP/1.0\r\n\r\n");   while(!feof($sock)) {       echo fgets($sock);    }} ?>

會得到下述資料:

HTTP/1.1 200 OKDate: Mon, 06 Oct 2008 15:45:27 GMTServer: Apache/2.2.9X-Powered-By: PHP/5.2.6-4Set-Cookie: PHPSESSID=4e037868a4619d6b4d8c52d0d5c59035; path=/Expires: Thu, 19 Nov 1981 08:52:00 GMTCache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0Pragma: no-cacheVary: Accept-EncodingConnection: closeContent-Type: text/html 

但是上述做法, 還是會有很多問題, 例如 302 redirect 等等, 簡單點的方法, 還是靠 curl 來幫我們處理掉這些麻煩事吧~ PHP + Curl + Content-Type 判斷

PHP + Curl 判斷此網頁是否存在, 詳可見: How To Check If Page Exists With CURL | W-Shadow.com

此程式會判斷 200 OK 等狀態資訊(200 ~ 400 間都是正常的狀態).

基本上, 上述那程式已經夠用, 不過使用者輸入的資料是千奇百怪的, 所以需要加上其它的判斷, 下述是隨便抓幾個有問題的網址: xxx@ooo.com # Email http://xxx.ooo.com/abc.zip # 壓縮檔 <script>alert('x')</script> # 幫你檢查是否有 XSS 漏洞 =.=|||

因為上述資料, 所以要把上述資訊 Filter 掉, 所以要多檢查是否是正常網址, 和 Content-Type 是否是我們要的.

於是程式修改如下(修改自: How To Check If Page Exists With CURL):

<?phpfunction page_exists($url){   $parts = parse_url($url);   if (!$parts) {      return false; /* the URL was seriously wrong */   }   if (isset($parts['user'])) {      return false; /* user@gmail.com */   }   $ch = curl_init();   curl_setopt($ch, CURLOPT_URL, $url);   /* set the user agent - might help, doesn't hurt */   //curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)');   curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (compatible; wowTreebot/1.0; +http://wowtree.com)');   curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);   /* try to follow redirects */   curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);   /* timeout after the specified number of seconds. assuming that this script runs      on a server, 20 seconds should be plenty of time to verify a valid URL.  */   curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);   curl_setopt($ch, CURLOPT_TIMEOUT, 20);   /* don't download the page, just the header (much faster in this case) */   curl_setopt($ch, CURLOPT_NOBODY, true);   curl_setopt($ch, CURLOPT_HEADER, true);   /* handle HTTPS links */   if ($parts['scheme'] == 'https') {      curl_setopt($ch, CURLOPT_SSL_VERIFYHOST,  1);      curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);   }   $response = curl_exec($ch);   curl_close($ch);   /* allow content-type list */   $content_type = false;   if (preg_match('/Content-Type: (.+\/.+?)/i', $response, $matches)) {       switch ($matches[1])        {           case 'application/atom+xml':           case 'application/rdf+xml':           //case 'application/x-sh':           case 'application/xhtml+xml':           case 'application/xml':           case 'application/xml-dtd':           case 'application/xml-external-parsed-entity':           //case 'application/pdf':           //case 'application/x-shockwave-flash':              $content_type = true;              break;        }       if (!$content_type && (preg_match('/text\/.*/', $matches[1]) || preg_match('/image\/.*/', $matches[1]))) {           $content_type = true;        }   }   if (!$content_type) {      return false;   }   /*  get the status code from HTTP headers */   if (preg_match('/HTTP\/1\.\d+\s+(\d+)/', $response, $matches)) {      $code = intval($matches[1]);   } else {      return false;   }   /* see if code indicates success */   return (($code >= 200) && ($code < 400));}// Test & 使用方法:// var_dump(page_exists('http://tw.yahoo.com'));?> 


  Content-Type information

上述 Content-Type 的資訊可由下述找到: /etc/mime.types /usr/share/doc/apache-common/examples/mime.types.gz /usr/share/doc/apache2.2-common/examples/apache2/mime.types.gz # 建議是看這個

 

 

****************************************************************************************************

1.  網址的格式:

function checkUrl($weburl)    {        return !ereg("^http(s)*://[_a-zA-Z0-9-]+(.[_a-zA-Z0-9-]+)*$", $weburl);    }  


 

2 . 判斷http 地址是否有效

function url_exists($url)   {       $ch = curl_init();       curl_setopt($ch, CURLOPT_URL,$url);       curl_setopt($ch, CURLOPT_NOBODY, 1); // 不下載       curl_setopt($ch, CURLOPT_FAILONERROR, 1);       curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);       return (curl_exec($ch)!==false) ? true : false;   }  

或者

function img_exists($url)    {       return file_get_contents($url,0,null,0,1) ? true : false;   }  

或者

function url_exists($url)    {       $head = @get_headers($url);       return is_array($head) ?  true : false;   }  

 

執行個體:

$url='http://www.sendnet.cn';   echo url_exists($url);  


 

 

 

 

 



 

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.