If you can find a little more careful to see a problem, we have a few crawl Baidu included or snapshot or hot Word program has a function file_get_contents (), he is the PHP collection page commonly used OH.
The code is as follows |
Copy Code |
/* Crawl Baidu included code */ function Baidu ($s) { $baidu = "Http://www.baidu.com/s?wd=site%3A". $s; $site =file_get_contents ($baidu); $site =iconv ("gb2312", "UTF-8", $site); Ereg ("Find related pages (. *),", $site, $count); $count =str_replace ("Find related pages", "", $count); $count =str_replace ("article,", "", $count); $count =str_replace ("About", "", $count); $count =str_replace (",", "", $count); return $count [0]; }
echo Baidu (www.hzhuti.com); Get the number of good topics included in Baidu ?> |
Get Baidu's hot words
code as follows |
copy code |
/** * * @user Little Jay * @return Array returns the hot Word data from Baidu (array return) */ function Getbaiduhotkeyword () { $templateRss = file_get_contents (' Http://top.baidu.com/rss_xml.php?p=top10 '); If (Preg_match ('/
/is ', $templateRss, $_description)) { $TEMPLATERSS = $_description [0]; $templateRss = Str_replace ("&", "&", $TEMPLATERSS); } $TEMPLATERSS = " ". $templateRss; $xml = simplexml_load_string ($TEMPLATERSS); foreach ($xml->tbody->tr as $temp) { if (!empty ($temp->td->a)) { $keyArray [] = Trim (($temp->td->a)); } } return $keyArray; } Print_r (Getbaiduhotkeyword ()); |
This is on the Internet to find a slightly modified under the following code to write to the PHP file
Baidu included and Baidu snapshot time
code as follows |
copy code |
$domain = "http://www.hzhuti.com/nokia/5230/* Domain name to query * * $site _url = ' Http://www.baidu.com/s?wd=site%3A '; $all = $site _url. $domain; /* All included URLs for domain */ $today = $all. ' &lm=1′; /* Domain name included in today's URL */ $utf _pattern = "/Find the relevant result number (. *)/"; $kz _pattern = "/(. *)/"; /* String to match the snapshot date */ $times = "/d{4}-d{1,2}-d{1,2}/"; /* Regular expressions that match the snapshot date, such as: 2011-8-4*/ $s 0 = @file_get_contents ($all); /* Place the Site:www.ninthday.net Web page into the $s0 string */ $s 1 = @file_get_contents ($today); Preg_match ($utf _pattern, $s 0, $all _num); /* Match "find related results * *" * * Preg_match ($utf _pattern, $s 1, $today _num); Preg_match ($kz _pattern, $s 0, $temp); Preg_match ($times, $temp [0], $screenshot); if ($all _num[1] = = "") $all _num[1] = 0; if ($today _num[1] = = "") $today _num[1] = 0; if ($screenshot [0] = = "") $screenshot [0] = "no snapshot"; ?>
Test
Date |
Baidu included |
Baidu today included |
Baidu Snapshot Date |
|
|
|
|
Baidu included: "target=" _blank "> Baidu today included: "target=" _blank "> Baidu Snapshot Date: ">
|
The above method is not strictly considered, if the server does not support the File_get_contents function we can not operate, so also use curl operation, this is more convenient to imitate the user Oh.
http://www.bkjia.com/PHPjc/631640.html www.bkjia.com true http://www.bkjia.com/PHPjc/631640.html techarticle If you have a little closer look to find a problem, we have a few crawl Baidu included or snapshot or hot Word program has a function file_get_contents (), he is a PHP collection network ...