PHP to determine whether the URL is correct/web page exists ____php

Source: Internet
Author: User
Tags php server

PHP to determine whether a Web page exists, the simple way is fopen/file_get_contents. Wait, there are a bunch of ways to do it, but these ways will be the whole page of HTML pull back, to judge a lot of web site data, it will be a bit slow.

To be judged by the HTTP HEADER, you don't have to grab the whole page back (see: Hypertext Transfer Protocol-http/1.1). Fsockopen to determine the HTTP Header

The simple examples are as follows (reprinted from: PHP Server Side scripting-checking if page exists)

<?php
if ($sock = Fsockopen (' something.net '))
{
   fputs ($sock, "head/something.html http/1.0\r\n\r\ n ");

   while (!feof ($sock)) {
       echo fgets ($sock);
    } 
? >

will receive the following information:

http/1.1 OK
Date:mon, modified Oct 2008 15:45:27 GMT
server:apache/2.2.9
x-powered-by:php/5.2.6-4
set-cookie:phpsessid=4e037868a4619d6b4d8c52d0d5c59035; path=/
Expires:thu, Nov 1981 08:52:00 GMT
Cache-control:no-store, No-cache, Must-revalidate, Post-check=0, p Re-check=0
pragma:no-cache
vary:accept-encoding
connection:close

But the above practice, there will be many problems, such as 302 redirect and so on, the simple point of the method, or rely on curl to help us deal with these troubles it ~ PHP + curl + content-type judgment

PHP + Curl To determine if this page exists, detailed: How to Check if page Exists with Curl | W-shadow.com

This program will determine the status information (200 ~ 400 is a normal state).

Basically, the above program is enough, but the user input data is strange, so need to add other judgments, the following is casually grab a few problematic URLs: xxx@ooo.com # Email Http://xxx.ooo.com/abc.zip # Compressed file <s Cript>alert (' x ') </script> # help you check for XSS Vulnerabilities =.=| | |

Because of the above information, we should remove the above information, so we should check whether it is the normal web site, and whether the content-type is what we want.

The program is modified as follows (modified from: How to Check If Page Exists with CURL):

<?php function Page_exists ($url) {$parts = Parse_url ($url); if (! $parts) {return false;/* The URL was seriously wrong/} if (Isset ($parts [' user '])) {return FA Lse
   /* user@gmail.com */} $ch = Curl_init ();

   curl_setopt ($ch, Curlopt_url, $url); /* Set the user Agent-might help, doesn ' t hurt///curl_setopt ($ch, Curlopt_useragent, ' mozilla/4.0 (compatible; MSIE 5.01;
   Windows NT 5.0);
   curl_setopt ($ch, Curlopt_useragent, ' mozilla/5.0 (compatible; wowtreebot/1.0; +http://wowtree.com) ');

   curl_setopt ($ch, curlopt_returntransfer,1);

   /* Try to follow redirects * * curl_setopt ($ch, curlopt_followlocation, 1); /* Timeout After the specified number of seconds.  Assuming this script runs on a server, seconds should is plenty to verify a valid URL.
   * * curl_setopt ($ch, Curlopt_connecttimeout, 15);

   curl_setopt ($ch, Curlopt_timeout, 20); /* don ' t download the page, just the header (much faster in thisCASE) */curl_setopt ($ch, Curlopt_nobody, true);

   curl_setopt ($ch, Curlopt_header, true);
      /* Handle HTTPS Links */if ($parts [' scheme '] = = ' https ') {curl_setopt ($ch, Curlopt_ssl_verifyhost, 1);
   curl_setopt ($ch, Curlopt_ssl_verifypeer, false);
   } $response = Curl_exec ($ch);

   Curl_close ($ch);
   /* Allow Content-type list */$content _type = false; if (Preg_match ('/content-type: (. +\/.+?)  /I ', $response, $matches)} {switch ($matches [1]) {case ' application/atom+xml ': case ' Application/rdf+xml '://case ' application/x-sh ': Case ' application/xhtml+xml ': Case ' AP
           Plication/xml ': Case ' application/xml-dtd ': Case ' application/xml-external-parsed-entity ':
              Case ' application/pdf '://case ' application/x-shockwave-flash ': $content _type = true;
        Break if (! $content _type && preg_match ('/text\/.*/', $matches [1]) | |
        Preg_match ('/image\/.*/', $matches [1])) {$content _type = true;
   } if (! $content _type) {return false;
      }/* Get the status code from HTTP headers */if (Preg_match ('/http\/1\.\d+\s+ (\d+)/', $response, $matches)) {
   $code = Intval ($matches [1]);
   else {return false;
}/* Check if code indicates success/return ($code >=) && ($code < 400)); 
 //Test & Use Method://Var_dump (page_exists (' http://tw.yahoo.com '));?>


Content-type Information

The above Content-type information can be found by the following:/etc/mime.types/usr/share/doc/apache-common/examples/mime.types.gz/usr/share/doc/ Apache2.2-common/examples/apache2/mime.types.gz # advice is to look at this

****************************************************************************************************

1. Format of the Web site:

function Checkurl ($weburl)    
{return    
    !ereg ("^http (s) *://[_a-za-z0-9-]+ (. [ _a-za-z0-9-]+) *$ ", $weburl);    
}  


2. Determine if the HTTP address is valid

function url_exists ($url)   
{   
    $ch = Curl_init ();   
    curl_setopt ($ch, Curlopt_url, $url);   
    curl_setopt ($ch, curlopt_nobody, 1); Do not download   
    curl_setopt ($ch, Curlopt_failonerror, 1);   
    curl_setopt ($ch, Curlopt_returntransfer, 1);   
    Return (Curl_exec ($ch)!==false)? True:false;   
}  

Or

function img_exists ($url)    
{return   
    file_get_contents ($url, 0,null,0,1)? True:false;   
}  

Or

function url_exists ($url)    
{   
    $head = @get_headers ($url);   
    Return Is_array ($head)?  True:false;   
}  

Instance:

$url = ' http://www.sendnet.cn ';   
echo url_exists ($url);  




 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.