PHP to determine whether the URL is correct/web page exists ___

PHP to determine whether the URL is correct/web page exists ____php

Last Update:2018-07-24 Source: Internet

Author: User

Tags php server

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

PHP to determine whether a Web page exists, the simple way is fopen/file_get_contents. Wait, there are a bunch of ways to do it, but these ways will be the whole page of HTML pull back, to judge a lot of web site data, it will be a bit slow.

To be judged by the HTTP HEADER, you don't have to grab the whole page back (see: Hypertext Transfer Protocol-http/1.1). Fsockopen to determine the HTTP Header

The simple examples are as follows (reprinted from: PHP Server Side scripting-checking if page exists)

<?php
if ($sock = Fsockopen (' something.net '))
{
   fputs ($sock, "head/something.html http/1.0\r\n\r\ n ");

   while (!feof ($sock)) {
       echo fgets ($sock);
    } 
? >

will receive the following information:

http/1.1 OK
Date:mon, modified Oct 2008 15:45:27 GMT
server:apache/2.2.9
x-powered-by:php/5.2.6-4
set-cookie:phpsessid=4e037868a4619d6b4d8c52d0d5c59035; path=/
Expires:thu, Nov 1981 08:52:00 GMT
Cache-control:no-store, No-cache, Must-revalidate, Post-check=0, p Re-check=0
pragma:no-cache
vary:accept-encoding
connection:close

But the above practice, there will be many problems, such as 302 redirect and so on, the simple point of the method, or rely on curl to help us deal with these troubles it ~ PHP + curl + content-type judgment

PHP + Curl To determine if this page exists, detailed: How to Check if page Exists with Curl | W-shadow.com

This program will determine the status information (200 ~ 400 is a normal state).

Basically, the above program is enough, but the user input data is strange, so need to add other judgments, the following is casually grab a few problematic URLs: xxx@ooo.com # Email Http://xxx.ooo.com/abc.zip # Compressed file <s Cript>alert (' x ') </script> # help you check for XSS Vulnerabilities =.=| | |

Because of the above information, we should remove the above information, so we should check whether it is the normal web site, and whether the content-type is what we want.

The program is modified as follows (modified from: How to Check If Page Exists with CURL):

<?php function Page_exists ($url) {$parts = Parse_url ($url); if (! $parts) {return false;/* The URL was seriously wrong/} if (Isset ($parts [' user '])) {return FA Lse
   /* user@gmail.com */} $ch = Curl_init ();

   curl_setopt ($ch, Curlopt_url, $url); /* Set the user Agent-might help, doesn ' t hurt///curl_setopt ($ch, Curlopt_useragent, ' mozilla/4.0 (compatible; MSIE 5.01;
   Windows NT 5.0);
   curl_setopt ($ch, Curlopt_useragent, ' mozilla/5.0 (compatible; wowtreebot/1.0; +http://wowtree.com) ');

   curl_setopt ($ch, curlopt_returntransfer,1);

   /* Try to follow redirects * * curl_setopt ($ch, curlopt_followlocation, 1); /* Timeout After the specified number of seconds.  Assuming this script runs on a server, seconds should is plenty to verify a valid URL.
   * * curl_setopt ($ch, Curlopt_connecttimeout, 15);

   curl_setopt ($ch, Curlopt_timeout, 20); /* don ' t download the page, just the header (much faster in thisCASE) */curl_setopt ($ch, Curlopt_nobody, true);

   curl_setopt ($ch, Curlopt_header, true);
      /* Handle HTTPS Links */if ($parts [' scheme '] = = ' https ') {curl_setopt ($ch, Curlopt_ssl_verifyhost, 1);
   curl_setopt ($ch, Curlopt_ssl_verifypeer, false);
   } $response = Curl_exec ($ch);

   Curl_close ($ch);
   /* Allow Content-type list */$content _type = false; if (Preg_match ('/content-type: (. +\/.+?)  /I ', $response, $matches)} {switch ($matches [1]) {case ' application/atom+xml ': case ' Application/rdf+xml '://case ' application/x-sh ': Case ' application/xhtml+xml ': Case ' AP
           Plication/xml ': Case ' application/xml-dtd ': Case ' application/xml-external-parsed-entity ':
              Case ' application/pdf '://case ' application/x-shockwave-flash ': $content _type = true;
        Break if (! $content _type && preg_match ('/text\/.*/', $matches [1]) | |
        Preg_match ('/image\/.*/', $matches [1])) {$content _type = true;
   } if (! $content _type) {return false;
      }/* Get the status code from HTTP headers */if (Preg_match ('/http\/1\.\d+\s+ (\d+)/', $response, $matches)) {
   $code = Intval ($matches [1]);
   else {return false;
}/* Check if code indicates success/return ($code >=) && ($code < 400)); 
 //Test & Use Method://Var_dump (page_exists (' http://tw.yahoo.com '));?>

Content-type Information

The above Content-type information can be found by the following:/etc/mime.types/usr/share/doc/apache-common/examples/mime.types.gz/usr/share/doc/ Apache2.2-common/examples/apache2/mime.types.gz # advice is to look at this

****************************************************************************************************

1. Format of the Web site:

function Checkurl ($weburl)    
{return    
    !ereg ("^http (s) *://[_a-za-z0-9-]+ (. [ _a-za-z0-9-]+) *$ ", $weburl);    
}

2. Determine if the HTTP address is valid

function url_exists ($url)   
{   
    $ch = Curl_init ();   
    curl_setopt ($ch, Curlopt_url, $url);   
    curl_setopt ($ch, curlopt_nobody, 1); Do not download   
    curl_setopt ($ch, Curlopt_failonerror, 1);   
    curl_setopt ($ch, Curlopt_returntransfer, 1);   
    Return (Curl_exec ($ch)!==false)? True:false;   
}

function img_exists ($url)    
{return   
    file_get_contents ($url, 0,null,0,1)? True:false;   
}

function url_exists ($url)    
{   
    $head = @get_headers ($url);   
    Return Is_array ($head)?  True:false;   
}

Instance:

$url = ' http://www.sendnet.cn ';   
echo url_exists ($url);

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More