Sample code for collecting out-of-site content from phpcurl

Source: Internet
Author: User
Sample code for collecting out-of-site content from phpcurl

  1. // Initialize a cURL object
  2. $ Curl = curl_init ();
  3. // Set the URL you want to capture
  4. Curl_setopt ($ curl, CURLOPT_URL, 'http: // bbs.it-home.org ');
  5. // Set the header
  6. Curl_setopt ($ curl, CURLOPT_HEADER, 1 );
  7. // Set the cURL parameter to save the result to the string or output to the screen.
  8. Curl_setopt ($ curl, CURLOPT_RETURNTRANSFER, 1 );
  9. // Run cURL to request the webpage
  10. $ Data = curl_exec ($ curl );
  11. // Close the URL request
  12. Curl_close ($ curl );

3. find key data through regular expression matching

  1. // $ Data is the value returned by curl_exec, that is, the collected target content.
  2. Preg_match_all ("/
  3. (.*?) <\/Li>/", $ data, $ out, PREG_SET_ORDER );
  4. Foreach ($ out as $ key => $ value ){
  5. // Here $ value is an array, and records the entire sentence with matching characters and matching characters separately.
  6. Echo 'the entire matched sentence: '. $ value [0].'
  7. ';
  8. Echo 'individually matched: '. $ value [1].'
  9. ';
  10. }

PHP curl tip 1. timeout settings you can use curl_setopt ($ ch, opt) to set timeout settings, including setting CURLOPT_TIMEOUT to the maximum number of seconds that cURL can be executed. CURLOPT_TIMEOUT_MS sets the maximum number of milliseconds that cURL can be executed. (Added in cURL 7.16.2. It can be used from PHP 5.2.3. ) CURLOPT_CONNECTTIMEOUT: The waiting time before the connection is initiated. if it is set to 0, the waiting time is unlimited. CURLOPT_CONNECTTIMEOUT_MS indicates the waiting time for the connection attempt, in milliseconds. If it is set to 0, the system waits for no limit. Added to cURL 7.16.2. Available from PHP 5.2.3. CURLOPT_DNS_CACHE_TIMEOUT sets the time for saving DNS information in the memory. the default value is 120 seconds.

2. submit data through post to retain cookies

  1. // Curl simulates login to the discuz program, suitable for DZ7.0

  2. ! Extension_loaded ('curl') & die ('The curl extension is not loaded .');

  3. $ Discuz_url = 'http: // bbs.it-home.org '; // forum address

  4. $ Login_url = $ discuz_url. '/logging. php? Action = login '; // logon page address
  5. $ Get_url = $ discuz_url. '/my. php? Item = threads'; // my post

  6. $ Post_fields = array ();

  7. // The following two items do not need to be modified
  8. $ Post_fields ['loginfield'] = 'username ';
  9. $ Post_fields ['loginsubmit '] = 'true ';
  10. // Username and password, required
  11. $ Post_fields ['username'] = 'jbxu ';
  12. $ Post_fields ['password'] = '000000 ';
  13. // Security question
  14. $ Post_fields ['questionid'] = 0;
  15. $ Post_fields ['answer'] = '';
  16. // @ Todo verification code
  17. $ Post_fields ['seccodeverify '] = '';

  18. // Obtain the FORMHASH form

  19. $ Ch = curl_init ($ login_url );
  20. Curl_setopt ($ ch, CURLOPT_HEADER, 0 );
  21. Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 );
  22. $ Contents = curl_exec ($ ch );
  23. Curl_close ($ ch );
  24. Preg_match ('/ /I ', $ contents, $ matches );
  25. If (! Empty ($ matches )){
  26. $ Formhash = $ matches [1];
  27. } Else {
  28. Die ('not found the forumhash .');
  29. }

  30. // POST the data to obtain the COOKIE

  31. $ Cookie_file = dirname (_ FILE _). '/cookie.txt ';
  32. // $ Cookie_file = tempnam ('/tmp ');
  33. $ Ch = curl_init ($ login_url );
  34. Curl_setopt ($ ch, CURLOPT_HEADER, 0 );
  35. Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 );
  36. Curl_setopt ($ ch, CURLOPT_POST, 1 );
  37. Curl_setopt ($ ch, CURLOPT_POSTFIELDS, $ post_fields );
  38. Curl_setopt ($ ch, CURLOPT_COOKIEJAR, $ cookie_file );
  39. Curl_exec ($ ch );
  40. Curl_close ($ ch );

  41. // Obtain the page content that requires logon with the COOKIE obtained above.

  42. $ Ch = curl_init ($ get_url );
  43. Curl_setopt ($ ch, CURLOPT_HEADER, 0 );
  44. Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 0 );
  45. Curl_setopt ($ ch, CURLOPT_COOKIEFILE, $ cookie_file );
  46. $ Contents = curl_exec ($ ch );
  47. Curl_close ($ ch );

  48. Var_dump ($ contents );

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.