Curl: how to implement out-of-site collection and tips _ PHP Tutorial

Source: Internet
Author: User
Curl provides methods and techniques for collecting data outside the site. The reason for selecting curl: The comparison between curl and file_get_contents is simple: file_get_contents is actually a combination of a bunch of built-in file operation functions, such as the reason for file_ex selecting curl

For curl and file_get_contents, the following is an easy-to-understand comparison:
File_get_contents is actually a combined version of a bunch of built-in file operation functions, such as file_exists, fopen, fread, and fclose. it is designed for lazy users and is mainly used to deal with local files, but it is also because of laziness, and added support for network files;
Curl is a library dedicated for network interaction. It provides a bunch of custom options to deal with different environments. the stability is naturally greater than file_get_contents.

Usage

1. enable curl support

After the php environment is installed, curl is not enabled by default. you need to modify the php. ini file and find it. extension = php_curl.dll, remove the colon, and restart the service;

2. use curl to capture data

The code is as follows:


// Initialize a cURL object
$ Curl = curl_init ();
// Set the URL you want to capture
Curl_setopt ($ curl, CURLOPT_URL, 'http: // www.cmx8.cn ');
// Set the header
Curl_setopt ($ curl, CURLOPT_HEADER, 1 );
// Set the cURL parameter to save the result to the string or output to the screen.
Curl_setopt ($ curl, CURLOPT_RETURNTRANSFER, 1 );
// Run cURL to request the webpage
$ Data = curl_exec ($ curl );
// Close the URL request
Curl_close ($ curl );

3. find key data through regular expression matching

The code is as follows:


// $ Data is the value returned by curl_exec, that is, the collected target content.
Preg_match_all ("/

  • (.*?) <\/Li>/", $ data, $ out, PREG_SET_ORDER );
    Foreach ($ out as $ key => $ value ){
    // Here $ value is an array, and records the entire sentence with matching characters and matching characters separately.
    Echo 'the entire matched sentence: '. $ value [0].'
    ';
    Echo 'individually matched: '. $ value [1].'
    ';
    }

    Tips

    1. timeout settings

    You can use curl_setopt ($ ch, opt) to set some timeout settings, including:

    CURLOPT_TIMEOUT sets the maximum number of seconds that cURL can be executed.
    CURLOPT_TIMEOUT_MS sets the maximum number of milliseconds that cURL can be executed. (Added in cURL 7.16.2. It can be used from PHP 5.2.3. )
    CURLOPT_CONNECTTIMEOUT is the waiting time before the connection is initiated. if it is set to 0, it will wait infinitely.
    CURLOPT_CONNECTTIMEOUT_MS indicates the waiting time for the connection attempt, in milliseconds. If it is set to 0, the system waits for no limit. Added to cURL 7.16.2. Available from PHP 5.2.3.
    CURLOPT_DNS_CACHE_TIMEOUT sets the time for saving DNS information in the memory. the default value is 120 seconds.

    The code is as follows:


    Curl_setopt ($ ch, CURLOPT_TIMEOUT, 60); // you only need to set the number of seconds.
    Curl_setopt ($ ch, CURLOPT_NOSIGNAL, 1); // note that you must set this parameter for millisecond timeout.
    Curl_setopt ($ ch, CURLOPT_TIMEOUT_MS, 200); // timeout in milliseconds, added to cURL 7.16.2. Available from PHP 5.2.3

    2. submit data through post to retain cookies

    The code is as follows:


    // The following example is used for learning:
    // Curl simulates login to the discuz program, suitable for DZ7.0

    ! Extension_loaded ('curl') & die ('The curl extension is not loaded .');

    $ Discuz_url = 'http: // www.lxvoip.com '; // forum address
    $ Login_url = $ discuz_url. '/logging. php? Action = login '; // logon page address
    $ Get_url = $ discuz_url. '/my. php? Item = threads'; // my post

    $ Post_fields = array ();
    // The following two items do not need to be modified
    $ Post_fields ['loginfield'] = 'username ';
    $ Post_fields ['loginsubmit '] = 'true ';
    // Username and password, required
    $ Post_fields ['username'] = 'lxvoip ';
    $ Post_fields ['password'] = '000000 ';
    // Security question
    $ Post_fields ['questionid'] = 0;
    $ Post_fields ['answer'] = '';
    // @ Todo verification code
    $ Post_fields ['seccodeverify '] = '';

    // Obtain the FORMHASH form
    $ Ch = curl_init ($ login_url );
    Curl_setopt ($ ch, CURLOPT_HEADER, 0 );
    Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 );
    $ Contents = curl_exec ($ ch );
    Curl_close ($ ch );
    Preg_match ('/ /I ', $ contents, $ matches );
    If (! Empty ($ matches )){
    $ Formhash = $ matches [1];
    } Else {
    Die ('not found the forumhash .');
    }

    // POST the data to obtain the COOKIE
    $ Cookie_file = dirname (_ FILE _). '/cookie.txt ';
    // $ Cookie_file = tempnam ('/tmp ');
    $ Ch = curl_init ($ login_url );
    Curl_setopt ($ ch, CURLOPT_HEADER, 0 );
    Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 );
    Curl_setopt ($ ch, CURLOPT_POST, 1 );
    Curl_setopt ($ ch, CURLOPT_POSTFIELDS, $ post_fields );
    Curl_setopt ($ ch, CURLOPT_COOKIEJAR, $ cookie_file );
    Curl_exec ($ ch );
    Curl_close ($ ch );

    // Obtain the page content that requires logon with the COOKIE obtained above.
    $ Ch = curl_init ($ get_url );
    Curl_setopt ($ ch, CURLOPT_HEADER, 0 );
    Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 0 );
    Curl_setopt ($ ch, CURLOPT_COOKIEFILE, $ cookie_file );
    $ Contents = curl_exec ($ ch );
    Curl_close ($ ch );

    Var_dump ($ contents );

    Refer to curl and file_get_contents to extract a simple comparison: file_get_contents is actually a combined version of a bunch of built-in file operation functions, such as file_ex...

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.