PHP Thief Program Instance Code

Source: Internet
Author: User
Tags curl regular expression

The file_get_contents function is the key to capturing the data below, so let's look at the file_get_contents function syntax

String file_get_contents (String $filename [, bool $use _include_path = False [, resource $context [, int $offset =-1 [, int $maxlen]]]
As with file (), only file_get_contents () reads the file into a string. Reads the content of length MaxLen at the location specified by the parameter offset. If it fails, file_get_contents () returns FALSE.

The file_get_contents () function is the preferred method for reading the contents of a file into a string. Memory-mapping technology is also used to enhance performance if supported by the operating system.

Cases

The code is as follows Copy Code

<?php
$homepage = file_get_contents (' http://www.111cn.net/');
Echo $homepage;
?>

So $homepage is that we collect the content of the net to save down, well say so much we begin.

Cases

The code is as follows Copy Code

<?php

function Fetch_urlpage_contents ($url) {
$c =file_get_contents ($url);
return $c;
}
Get matching content
function fetch_match_contents ($begin, $end, $c)
{
$begin =change_match_string ($begin);
$end =change_match_string ($end);
$p = "{$begin} (. *) {$end}";
if (eregi ($p, $c, $rs))
{
return $rs [1];}
else {return "";}
}//Escape Regular Expression string
function change_match_string ($STR) {
Note that the following is just a simple escape
$old =array ("/", "$");
$new =array ("/", "$");
$str =str_replace ($old, $new, $STR);
return $str;
}

Collecting Web pages
function pick ($url, $ft, $th)
{
$c =fetch_urlpage_contents ($url);
foreach ($ft as $key => $value)
{
$rs [$key]=fetch_match_contents ($value ["Begin"], $value ["End"], $c);
if (Is_array ($th [$key]))
{foreach ($th [$key] as $old => $new)
{
$rs [$key]=str_replace ($old, $new, $rs [$key]);
}
}
}
return $rs;
}

$url = "Http://www.111cn.net"; The address to collect
$ft ["title"] ["Begin"]= "<title>"; The start point of the interception
$ft ["title"] ["End"]= "</title>"; End point of interception
$th ["title"] ["Zhongshan"]= "Guangdong"; Replacement of intercepted parts

$ft ["Body"] ["Begin"]= "<body>"; The start point of the interception
$ft ["Body"] ["End"]= "</body>"; End point of interception
$th ["Body"] ["Zhongshan"]= "Guangdong"; Replacement of intercepted parts

$rs =pick ($url, $ft, $th); Start collecting

echo $rs ["title"];
echo $rs ["body"]; Output
?>

The following code is modified from the previous side and is designed to extract all hyperlinks, mailboxes, or other specific content on a Web page.

The code is as follows Copy Code

<?php

function Fetch_urlpage_contents ($url) {
$c =file_get_contents ($url);
return $c;
}
Get matching content
function fetch_match_contents ($begin, $end, $c)
{
$begin =change_match_string ($begin);
$end =change_match_string ($end);
$p = "#{$begin} (. *) {$end} #iU";//i indicates ignore case, u forbids greedy match
if (Preg_match_all ($p, $c, $rs))
{
return $rs;}
else {return "";}
}//Escape Regular Expression string
function change_match_string ($STR) {
Note that the following is just a simple escape
$old =array ("/", "$", '? ');
$new =array ("/", "$", '? ');
$str =str_replace ($old, $new, $STR);
return $str;
}

Collecting Web pages
function pick ($url, $ft, $th)
{
$c =fetch_urlpage_contents ($url);
foreach ($ft as $key => $value)
{
$rs [$key]=fetch_match_contents ($value ["Begin"], $value ["End"], $c);
if (Is_array ($th [$key]))
{foreach ($th [$key] as $old => $new)
{
$rs [$key]=str_replace ($old, $new, $rs [$key]);
}
}
}
return $rs;
}

$url = "Http://www.111cn.net"; The address to collect
$ft ["A"] ["Begin"]= ' <a '; Intercept the start point <br/>
$ft ["A"] ["End"]= ' > '; End point of interception

$rs =pick ($url, $ft, $th); Start collecting

Print_r ($rs ["a"]);

?>

small hint file_get_contents is very easy to be collected, we can use curl to imitate the user to visit the site, which is higher than the above to a lot of Oh, file_get_contents () efficiency slightly lower, commonly used failure situation, Curl () is very efficient, support multithreading, but need to open the curl extension. The following are the steps to open the Curl extension:

1, the PHP folder under the three files Php_curl.dll,libeay32.dll,ssleay32.dll copy to the System32;

2, in the php.ini (c:windows directory) in the Extension=php_curl.dll in the semicolon removed;

3, restart Apache or IIS.

Simple crawl page function with forged Referer and user_agent functions

The code is as follows Copy Code

<?php
Function Getsources ($Url, $User _agent= ', $Referer _url= ')//crawl a specified page
{
//$URL page address to be crawled
//$User _ The Agent needs to return user_agent information such as "Baiduspider" or "Googlebot"
$ch = Curl_init ();
curl_setopt ($ch, Curlopt_url, $URL);
curl_setopt ($ch, curlopt_useragent, $User _agent);
curl_setopt ($ch, Curlopt_referer, $Referer _url);
curl_setopt ($ch, curlopt_followlocation,1);
curl_setopt ($ch, Curlopt_returntransfer, 1);
$MySources = curl_exec ($ch);
Curl_close ($ch);
return $MySources;
}
$Url = "http://www.111cn.net";//There is no
$User _agent = "baiduspider+ to get Content" (+http://www.baidu.com/search/ spider.htm) ";
$Referer _url = ' http://www.111cn.net/';
Echo getsources ($URL, $User _agent, $Referer _url);
?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.