PHP Thief Program Instance Code

Last Update:2017-01-13 Source: Internet

Author: User

Tags curl regular expression

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The file_get_contents function is the key to capturing the data below, so let's look at the file_get_contents function syntax

String file_get_contents (String $filename [, bool $use _include_path = False [, resource $context [, int $offset =-1 [, int $maxlen]]]
As with file (), only file_get_contents () reads the file into a string. Reads the content of length MaxLen at the location specified by the parameter offset. If it fails, file_get_contents () returns FALSE.

The file_get_contents () function is the preferred method for reading the contents of a file into a string. Memory-mapping technology is also used to enhance performance if supported by the operating system.

Cases

The code is as follows	Copy Code
<?php $homepage = file_get_contents (' http://www.111cn.net/'); Echo $homepage; ?>

So $homepage is that we collect the content of the net to save down, well say so much we begin.

Cases

The code is as follows

Copy Code

<?php

Collecting Web pages
function pick ($url, $ft, $th)
{
$c =fetch_urlpage_contents ($url);
foreach ($ft as $key => $value)
{
$rs [$key]=fetch_match_contents ($value ["Begin"], $value ["End"], $c);
if (Is_array ($th [$key]))
{foreach ($th [$key] as $old => $new)
{
$rs [$key]=str_replace ($old, $new, $rs [$key]);
}
}
}
return $rs;
}

$url = "Http://www.111cn.net"; The address to collect
$ft ["title"] ["Begin"]= "<title>"; The start point of the interception
$ft ["title"] ["End"]= "</title>"; End point of interception
$th ["title"] ["Zhongshan"]= "Guangdong"; Replacement of intercepted parts

$ft ["Body"] ["Begin"]= "<body>"; The start point of the interception
$ft ["Body"] ["End"]= "</body>"; End point of interception
$th ["Body"] ["Zhongshan"]= "Guangdong"; Replacement of intercepted parts

$rs =pick ($url, $ft, $th); Start collecting

echo $rs ["title"];
echo $rs ["body"]; Output
?>

The following code is modified from the previous side and is designed to extract all hyperlinks, mailboxes, or other specific content on a Web page.

The code is as follows

Copy Code

<?php

$url = "Http://www.111cn.net"; The address to collect
$ft ["A"] ["Begin"]= ' <a '; Intercept the start point <br/>
$ft ["A"] ["End"]= ' > '; End point of interception

$rs =pick ($url, $ft, $th); Start collecting

Print_r ($rs ["a"]);

small hint file_get_contents is very easy to be collected, we can use curl to imitate the user to visit the site, which is higher than the above to a lot of Oh, file_get_contents () efficiency slightly lower, commonly used failure situation, Curl () is very efficient, support multithreading, but need to open the curl extension. The following are the steps to open the Curl extension:

1, the PHP folder under the three files Php_curl.dll,libeay32.dll,ssleay32.dll copy to the System32;

2, in the php.ini (c:windows directory) in the Extension=php_curl.dll in the semicolon removed;

3, restart Apache or IIS.

Simple crawl page function with forged Referer and user_agent functions

The code is as follows

Copy Code

<?php
Function Getsources ($Url, $User _agent= ', $Referer _url= ')//crawl a specified page
{
//$URL page address to be crawled
//$User _ The Agent needs to return user_agent information such as "Baiduspider" or "Googlebot"
$ch = Curl_init ();
curl_setopt ($ch, Curlopt_url, $URL);
curl_setopt ($ch, curlopt_useragent, $User _agent);
curl_setopt ($ch, Curlopt_referer, $Referer _url);
curl_setopt ($ch, curlopt_followlocation,1);
curl_setopt ($ch, Curlopt_returntransfer, 1);
$MySources = curl_exec ($ch);
Curl_close ($ch);
return $MySources;
}
$Url = "http://www.111cn.net";//There is no
$User _agent = "baiduspider+ to get Content" (+http://www.baidu.com/search/ spider.htm) ";
$Referer _url = ' http://www.111cn.net/';
Echo getsources ($URL, $User _agent, $Referer _url);
?

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More