Curl to realize the method and technique of collecting outside station _php skill

Source: Internet
Author: User
Tags curl

Reasons to Choose Curl

With regard to curl and file_get_contents, excerpt a plain and easy comparison:
File_get_contents is actually a combination of built-in file manipulation functions, such as file_exists,fopen,fread,fclose, specifically for lazy people, and it's primarily used against local files, but also because of lazy people, At the same time, added to the network file support;
Curl is a library dedicated to network interaction, providing a bunch of custom options for dealing with different environments, which are naturally more stable than file_get_contents.

How to use

1, Open Curl support

Because the PHP environment is installed by default is not open curl support, you need to modify the php.ini file, find, Extension=php_curl.dll, the previous colon removed, restart the service can;

2, the use of Curl data capture

Copy Code code as follows:

Initializes a CURL object
$curl = Curl_init ();
Set the URL you want to crawl
curl_setopt ($curl, Curlopt_url, ' http://www.cmx8.cn ');
Set Header
curl_setopt ($curl, Curlopt_header, 1);
Sets the curl parameter to require that the results be saved to the string or to the screen.
curl_setopt ($curl, Curlopt_returntransfer, 1);
Run Curl, request Web page
$data = curl_exec ($curl);
Close URL Request
Curl_close ($curl);

3, through the regular match to find the key data

Copy Code code as follows:

$data is the value returned by the curl_exec, that is, the target content of the collection
Preg_match_all ("/<li class=\" item\ ">" (. *?) <\/li>/", $data, $out, Preg_set_order);
foreach ($out as $key => $value) {
Here $value is an array, and records find the whole sentence with matching characters and the individual matching characters
Echo ' match to the whole sentence: '. $value [0]. '
';
Echo ' alone matched to: '. $value [1]. '
';
}

Skills

1, timeout related settings

by curl_setopt ($ch, opt) You can set some time-out settings, including:

Curlopt_timeout sets the maximum number of seconds that curl is allowed to execute.
Curlopt_timeout_ms sets the maximum number of milliseconds that the curl allows to execute. (Joined in the Curl 7.16.2.) Available from PHP 5.2.3. )
Curlopt_connecttimeout the time to wait before initiating the connection, and if set to 0, wait indefinitely.
Curlopt_connecttimeout_ms the time, in milliseconds, that the attempt to connect waits. If set to 0, wait indefinitely. Be joined in the Curl 7.16.2. Available starting from PHP 5.2.3.
Curlopt_dns_cache_timeout sets the time to save DNS information in memory by default of 120 seconds.

Copy Code code as follows:

curl_setopt ($ch, curlopt_timeout, 60); You just need to set a number of seconds to
curl_setopt ($ch, curlopt_nosignal, 1); Note that the millisecond timeout must be set for this
curl_setopt ($ch, Curlopt_timeout_ms, 200); Timeout millisecond, joined in CURL 7.16.2. Available from PHP 5.2.3

2. Submit data by post, keep cookies

Copy Code code as follows:

The following excerpt an example to learn from:
Curl Analog Login Discuz program, suitable for DZ7.0

!extension_loaded (' curl ') && die (' The curl extension is not loaded. ');

$discuz _url = ' http://www.lxvoip.com ';//Forum Address
$login _url = $discuz _url. /logging.php?action=login ';//Login page address
$get _url = $discuz _url. /my.php?item=threads '; My posts

$post _fields = Array ();
The following two items do not need to be modified
$post _fields[' loginfield '] = ' username ';
$post _fields[' loginsubmit '] = ' true ';
User name and password must be filled in
$post _fields[' username '] = ' lxvoip ';
$post _fields[' password '] = ' 88888888 ';
Security Questions
$post _fields[' QuestionID '] = 0;
$post _fields[' answer '] = ';
@todo Verification Code
$post _fields[' seccodeverify '] = ';

Get Form Formhash
$ch = Curl_init ($login _url);
curl_setopt ($ch, Curlopt_header, 0);
curl_setopt ($ch, Curlopt_returntransfer, 1);
$contents = curl_exec ($ch);
Curl_close ($ch);
Preg_match ('/<input\s*type= "hidden" \s*name= "Formhash" \s*value= "(. *?)" \s*\/>/i ', $contents, $matches);
if (!empty ($matches)) {
$formhash = $matches [1];
} else {
Die (' not found the Forumhash ');
}

Post data, getting cookies
$cookie _file = dirname (__file__). '/cookie.txt ';
$cookie _file = Tempnam (' tmp ');
$ch = Curl_init ($login _url);
curl_setopt ($ch, Curlopt_header, 0);
curl_setopt ($ch, Curlopt_returntransfer, 1);
curl_setopt ($ch, Curlopt_post, 1);
curl_setopt ($ch, Curlopt_postfields, $post _fields);
curl_setopt ($ch, Curlopt_cookiejar, $cookie _file);
Curl_exec ($ch);
Curl_close ($ch);

Take the cookie above and get the content of the page that you need to log in to view
$ch = Curl_init ($get _url);
curl_setopt ($ch, Curlopt_header, 0);
curl_setopt ($ch, Curlopt_returntransfer, 0);
curl_setopt ($ch, Curlopt_cookiefile, $cookie _file);
$contents = curl_exec ($ch);
Curl_close ($ch);

Var_dump ($contents);

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.