PHP's curl () is highly efficient in capturing web pages and supports multithreading, while file_get_contents () is less efficient. Of course, curl extension must be enabled when curl is used.
Code practice
First, let's look at the logon code:
The code is as follows: |
Copy code |
// Simulate logon Function login_post ($ url, $ cookie, $ post ){ $ Curl = curl_init (); // initialize the curl module Curl_setopt ($ curl, CURLOPT_URL, $ url); // address submitted for logon Curl_setopt ($ curl, CURLOPT_HEADER, 0); // whether to display header information Curl_setopt ($ curl, CURLOPT_RETURNTRANSFER, 0); // whether to automatically display the returned information Curl_setopt ($ curl, CURLOPT_COOKIEJAR, $ cookie); // Set the Cookie information to save in the specified file Curl_setopt ($ curl, CURLOPT_POST, 1); // post method submission Curl_setopt ($ curl, CURLOPT_POSTFIELDS, http_build_query ($ post); // information to be submitted Curl_exec ($ curl); // execute cURL Curl_close ($ curl); // closes the cURL resource and releases the system resource. } |
The login_post () function first initializes curl_init (), and then uses curl_setopt () to set related options, including the url address to be submitted and the cookie file to be saved, post data (username, password, and other information), whether to return information, etc. Then, curl_exec executes curl, and finally curl_close () releases the resource. Note that the http_build_query () in PHP can convert the array into a connected string.
Next, if the logon succeeds, we need to obtain the page information after successful logon.
The code is as follows: |
Copy code |
// Obtain data after successful logon Function get_content ($ url, $ cookie ){ $ Ch = curl_init (); Curl_setopt ($ ch, CURLOPT_URL, $ url ); Curl_setopt ($ ch, CURLOPT_HEADER, 0 ); Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 ); Curl_setopt ($ ch, CURLOPT_COOKIEFILE, $ cookie); // read cookie $ Rs = curl_exec ($ ch); // execute cURL to capture the page content Curl_close ($ ch ); Return $ rs; } |
The function get_content () also initializes curl first, then sets related options, executes curl, and releases resources. Here, we set CURLOPT_RETURNTRANSFER to 1 to automatically return information, while CURLOPT_COOKIEFILE can read the cookie information saved during logon, and finally return the page content.
Our ultimate goal is to obtain information after simulated logon, that is, useful information that can be obtained only after successful normal logon. Next we will take logging on to the open-source China Mobile edition as an example to see how to capture the information after successful login.
/
The code is as follows: |
Copy code |
/Set post data $ Post = array ( 'Email '=> 'oschina account ', 'Pwd' => 'osschina password ', 'Goto _ page' => '/my ', 'Error _ page' => '/login ', 'SAVE _ login' => '1 ', 'Submit '=> 'Login now' ); // Logon address $ Url = http://www.111cn.net; // Set the cookie storage path $ Cookie = dirname (_ FILE _). '/cookie_oschina.txt '; // The address for obtaining information after logon $ Url2 = "http://m.oschina.net/my "; // Simulate logon Login_post ($ url, $ cookie, $ post ); // Obtain the logon page information $ Content = get_content ($ url2, $ cookie ); // Delete the cookie file @ Unlink ($ cookie ); // Match the page information $ Preg = "/<td class = 'portrait'> (. *) </td>/I "; Preg_match_all ($ preg, $ content, $ arr ); $ Str = $ arr [1] [0]; // Output Content Echo $ str; |
After running the above code, we will see the final picture of the login user's profile picture.
Usage summary
1. Initialize curl;
2. Use curl_setopt to set the target url and other options;
3. curl_exec: execute curl;
4. Disable curl after execution;
5. Output data.