Curl provides methods and techniques for collecting data outside the site. The reason for selecting curl: The comparison between curl and file_get_contents is simple: file_get_contents is actually a combination of a bunch of built-in file operation functions, such as the reason for file_ex selecting curl
For curl and file_get_contents, the following is an easy-to-understand comparison:
File_get_contents is actually a combined version of a bunch of built-in file operation functions, such as file_exists, fopen, fread, and fclose. it is designed for lazy users and is mainly used to deal with local files, but it is also because of laziness, and added support for network files;
Curl is a library dedicated for network interaction. It provides a bunch of custom options to deal with different environments. the stability is naturally greater than file_get_contents.
Usage
1. enable curl support
After the php environment is installed, curl is not enabled by default. you need to modify the php. ini file and find it. extension = php_curl.dll, remove the colon, and restart the service;
2. use curl to capture data
The code is as follows:
// Initialize a cURL object
$ Curl = curl_init ();
// Set the URL you want to capture
Curl_setopt ($ curl, CURLOPT_URL, 'http: // www.cmx8.cn ');
// Set the header
Curl_setopt ($ curl, CURLOPT_HEADER, 1 );
// Set the cURL parameter to save the result to the string or output to the screen.
Curl_setopt ($ curl, CURLOPT_RETURNTRANSFER, 1 );
// Run cURL to request the webpage
$ Data = curl_exec ($ curl );
// Close the URL request
Curl_close ($ curl );
3. find key data through regular expression matching
The code is as follows:
// $ Data is the value returned by curl_exec, that is, the collected target content.
Preg_match_all ("/
(.*?) <\/Li>/", $ data, $ out, PREG_SET_ORDER );
Foreach ($ out as $ key => $ value ){
// Here $ value is an array, and records the entire sentence with matching characters and matching characters separately.
Echo 'the entire matched sentence: '. $ value [0].'
';
Echo 'individually matched: '. $ value [1].'
';
}
Tips
1. timeout settings
You can use curl_setopt ($ ch, opt) to set some timeout settings, including:
CURLOPT_TIMEOUT sets the maximum number of seconds that cURL can be executed.
CURLOPT_TIMEOUT_MS sets the maximum number of milliseconds that cURL can be executed. (Added in cURL 7.16.2. It can be used from PHP 5.2.3. )
CURLOPT_CONNECTTIMEOUT is the waiting time before the connection is initiated. if it is set to 0, it will wait infinitely.
CURLOPT_CONNECTTIMEOUT_MS indicates the waiting time for the connection attempt, in milliseconds. If it is set to 0, the system waits for no limit. Added to cURL 7.16.2. Available from PHP 5.2.3.
CURLOPT_DNS_CACHE_TIMEOUT sets the time for saving DNS information in the memory. the default value is 120 seconds.
The code is as follows:
Curl_setopt ($ ch, CURLOPT_TIMEOUT, 60); // you only need to set the number of seconds.
Curl_setopt ($ ch, CURLOPT_NOSIGNAL, 1); // note that you must set this parameter for millisecond timeout.
Curl_setopt ($ ch, CURLOPT_TIMEOUT_MS, 200); // timeout in milliseconds, added to cURL 7.16.2. Available from PHP 5.2.3
2. submit data through post to retain cookies
The code is as follows:
// The following example is used for learning:
// Curl simulates login to the discuz program, suitable for DZ7.0
! Extension_loaded ('curl') & die ('The curl extension is not loaded .');
$ Discuz_url = 'http: // www.lxvoip.com '; // forum address
$ Login_url = $ discuz_url. '/logging. php? Action = login '; // logon page address
$ Get_url = $ discuz_url. '/my. php? Item = threads'; // my post
$ Post_fields = array ();
// The following two items do not need to be modified
$ Post_fields ['loginfield'] = 'username ';
$ Post_fields ['loginsubmit '] = 'true ';
// Username and password, required
$ Post_fields ['username'] = 'lxvoip ';
$ Post_fields ['password'] = '000000 ';
// Security question
$ Post_fields ['questionid'] = 0;
$ Post_fields ['answer'] = '';
// @ Todo verification code
$ Post_fields ['seccodeverify '] = '';
// Obtain the FORMHASH form
$ Ch = curl_init ($ login_url );
Curl_setopt ($ ch, CURLOPT_HEADER, 0 );
Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 );
$ Contents = curl_exec ($ ch );
Curl_close ($ ch );
Preg_match ('/ /I ', $ contents, $ matches );
If (! Empty ($ matches )){
$ Formhash = $ matches [1];
} Else {
Die ('not found the forumhash .');
}
// POST the data to obtain the COOKIE
$ Cookie_file = dirname (_ FILE _). '/cookie.txt ';
// $ Cookie_file = tempnam ('/tmp ');
$ Ch = curl_init ($ login_url );
Curl_setopt ($ ch, CURLOPT_HEADER, 0 );
Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 );
Curl_setopt ($ ch, CURLOPT_POST, 1 );
Curl_setopt ($ ch, CURLOPT_POSTFIELDS, $ post_fields );
Curl_setopt ($ ch, CURLOPT_COOKIEJAR, $ cookie_file );
Curl_exec ($ ch );
Curl_close ($ ch );
// Obtain the page content that requires logon with the COOKIE obtained above.
$ Ch = curl_init ($ get_url );
Curl_setopt ($ ch, CURLOPT_HEADER, 0 );
Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 0 );
Curl_setopt ($ ch, CURLOPT_COOKIEFILE, $ cookie_file );
$ Contents = curl_exec ($ ch );
Curl_close ($ ch );
Var_dump ($ contents );
Refer to curl and file_get_contents to extract a simple comparison: file_get_contents is actually a combined version of a bunch of built-in file operation functions, such as file_ex...