PHP to determine whether a Web page exists, the simple way is fopen/file_get_contents. Wait, there are a bunch of ways to do it, but these ways will be the whole page of HTML pull back, to judge a lot of web site data, it will be a bit slow.
To be judged by the HTTP HEADER, you don't have to grab the whole page back (see: Hypertext Transfer Protocol-http/1.1). Fsockopen to determine the HTTP Header
The simple examples are as follows (reprinted from: PHP Server Side scripting-checking if page exists)
<?php
if ($sock = Fsockopen (' something.net '))
{
fputs ($sock, "head/something.html http/1.0\r\n\r\ n ");
while (!feof ($sock)) {
echo fgets ($sock);
}
? >
will receive the following information:
http/1.1 OK
Date:mon, modified Oct 2008 15:45:27 GMT
server:apache/2.2.9
x-powered-by:php/5.2.6-4
set-cookie:phpsessid=4e037868a4619d6b4d8c52d0d5c59035; path=/
Expires:thu, Nov 1981 08:52:00 GMT
Cache-control:no-store, No-cache, Must-revalidate, Post-check=0, p Re-check=0
pragma:no-cache
vary:accept-encoding
connection:close
But the above practice, there will be many problems, such as 302 redirect and so on, the simple point of the method, or rely on curl to help us deal with these troubles it ~ PHP + curl + content-type judgment
PHP + Curl To determine if this page exists, detailed: How to Check if page Exists with Curl | W-shadow.com
This program will determine the status information (200 ~ 400 is a normal state).
Basically, the above program is enough, but the user input data is strange, so need to add other judgments, the following is casually grab a few problematic URLs: xxx@ooo.com # Email Http://xxx.ooo.com/abc.zip # Compressed file <s Cript>alert (' x ') </script> # help you check for XSS Vulnerabilities =.=| | |
Because of the above information, we should remove the above information, so we should check whether it is the normal web site, and whether the content-type is what we want.
The program is modified as follows (modified from: How to Check If Page Exists with CURL):
<?php function Page_exists ($url) {$parts = Parse_url ($url); if (! $parts) {return false;/* The URL was seriously wrong/} if (Isset ($parts [' user '])) {return FA Lse
/* user@gmail.com */} $ch = Curl_init ();
curl_setopt ($ch, Curlopt_url, $url); /* Set the user Agent-might help, doesn ' t hurt///curl_setopt ($ch, Curlopt_useragent, ' mozilla/4.0 (compatible; MSIE 5.01;
Windows NT 5.0);
curl_setopt ($ch, Curlopt_useragent, ' mozilla/5.0 (compatible; wowtreebot/1.0; +http://wowtree.com) ');
curl_setopt ($ch, curlopt_returntransfer,1);
/* Try to follow redirects * * curl_setopt ($ch, curlopt_followlocation, 1); /* Timeout After the specified number of seconds. Assuming this script runs on a server, seconds should is plenty to verify a valid URL.
* * curl_setopt ($ch, Curlopt_connecttimeout, 15);
curl_setopt ($ch, Curlopt_timeout, 20); /* don ' t download the page, just the header (much faster in thisCASE) */curl_setopt ($ch, Curlopt_nobody, true);
curl_setopt ($ch, Curlopt_header, true);
/* Handle HTTPS Links */if ($parts [' scheme '] = = ' https ') {curl_setopt ($ch, Curlopt_ssl_verifyhost, 1);
curl_setopt ($ch, Curlopt_ssl_verifypeer, false);
} $response = Curl_exec ($ch);
Curl_close ($ch);
/* Allow Content-type list */$content _type = false; if (Preg_match ('/content-type: (. +\/.+?) /I ', $response, $matches)} {switch ($matches [1]) {case ' application/atom+xml ': case ' Application/rdf+xml '://case ' application/x-sh ': Case ' application/xhtml+xml ': Case ' AP
Plication/xml ': Case ' application/xml-dtd ': Case ' application/xml-external-parsed-entity ':
Case ' application/pdf '://case ' application/x-shockwave-flash ': $content _type = true;
Break if (! $content _type && preg_match ('/text\/.*/', $matches [1]) | |
Preg_match ('/image\/.*/', $matches [1])) {$content _type = true;
} if (! $content _type) {return false;
}/* Get the status code from HTTP headers */if (Preg_match ('/http\/1\.\d+\s+ (\d+)/', $response, $matches)) {
$code = Intval ($matches [1]);
else {return false;
}/* Check if code indicates success/return ($code >=) && ($code < 400));
//Test & Use Method://Var_dump (page_exists (' http://tw.yahoo.com '));?>
Content-type Information
The above Content-type information can be found by the following:/etc/mime.types/usr/share/doc/apache-common/examples/mime.types.gz/usr/share/doc/ Apache2.2-common/examples/apache2/mime.types.gz # advice is to look at this
****************************************************************************************************
1. Format of the Web site:
function Checkurl ($weburl)
{return
!ereg ("^http (s) *://[_a-za-z0-9-]+ (. [ _a-za-z0-9-]+) *$ ", $weburl);
}
2. Determine if the HTTP address is valid
function url_exists ($url)
{
$ch = Curl_init ();
curl_setopt ($ch, Curlopt_url, $url);
curl_setopt ($ch, curlopt_nobody, 1); Do not download
curl_setopt ($ch, Curlopt_failonerror, 1);
curl_setopt ($ch, Curlopt_returntransfer, 1);
Return (Curl_exec ($ch)!==false)? True:false;
}
Or
function img_exists ($url)
{return
file_get_contents ($url, 0,null,0,1)? True:false;
}
Or
function url_exists ($url)
{
$head = @get_headers ($url);
Return Is_array ($head)? True:false;
}
Instance:
$url = ' http://www.sendnet.cn ';
echo url_exists ($url);