Php curl simulates logon and obtains detailed details of data instances, curl details
PHP's curl () is highly efficient in capturing web pages and supports multithreading, while file_get_contents () is less efficient. Of course, curl extension must be enabled when curl is used.
Code practice
First, let's look at the logon code:
// Simulate login_post ($ url, $ cookie, $ post) {$ curl = curl_init (); // initialize the curl module curl_setopt ($ curl, CURLOPT_URL, $ url ); // The URL curl_setopt ($ curl, CURLOPT_HEADER, 0) submitted for Logon; // whether to display the header information curl_setopt ($ curl, CURLOPT_RETURNTRANSFER, 0 ); // whether to automatically display the returned information curl_setopt ($ curl, CURLOPT_COOKIEJAR, $ cookie); // set the Cookie information to save in the specified file curl_setopt ($ curl, CURLOPT_POST, 1 ); // submit curl_setopt ($ curl, CURLOPT_POSTFIELDS, http_build_query ($ post) in post mode; // curl_exec ($ curl) the information to be submitted ); // execute cURL curl_close ($ curl); // close the cURL resource and release the system resource}
The login_post () function first initializes curl_init (), and then uses curl_setopt () to set related options, including the url address to be submitted and the cookie file to be saved, post Data (username, password, and other information), whether to return information, etc. Then, curl_exec executes curl, and finally curl_close () releases the resource. Note that the http_build_query () in PHP can convert the array into a connected string.
Next, if the logon succeeds, we need to obtain the page information after successful logon.
// Function get_content ($ url, $ cookie) {$ ch = curl_init (); curl_setopt ($ ch, CURLOPT_URL, $ url); curl_setopt ($ ch, $ ch, CURLOPT_HEADER, 0); curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt ($ ch, CURLOPT_COOKIEFILE, $ cookie); // read cookie $ rs = curl_exec ($ ch ); // execute cURL to capture the page content curl_close ($ ch); return $ rs ;}
The function get_content () also initializes curl first, then sets related options, executes curl, and releases resources. Here, we set CURLOPT_RETURNTRANSFER to 1 to automatically return information, while CURLOPT_COOKIEFILE can read the cookie information saved during logon, and finally return the page content.
Our ultimate goal is to obtain information after Simulated logon, that is, useful information that can be obtained only after successful normal logon. Next we will take logging on to the open-source China Mobile edition as an example to see how to capture the information after successful login.
// Set the post data $ post = array ('email '=> 'ossina', 'pwd' => 'ossina ', 'Goto _ page' => '/my', 'error _ page' =>'/login', 'Save _ login' => '1 ', 'submit '=> 'login now'); // login address $ url = "http://m.oschina.net/action/user/login"; // set the cookie save path $ cookie = dirname (_ FILE __). '/cookie_oschina.txt'; // address for obtaining information after logon $ url2 = "http://m.oschina.net/my"; // simulate login_post login ($ url, $ cookie, $ post ); // get the login page information $ content = get_content ($ url2, $ cookie); // Delete the cookie file @ unlink ($ cookie ); // match the page information $ preg = "/<td class = 'portrait'> (. *) <\/td>/I "; preg_match_all ($ preg, $ content, $ arr); $ str = $ arr [1] [0]; // output content echo $ str;
Usage Summary
1. initialize curl;
2. Use curl_setopt to set the target url and other options;
3. curl_exec: Execute curl;
4. Disable curl after execution;
5. output data.
Thank you for reading this article. I hope it will help you. Thank you for your support for this site!