What is PHP curl for?
I use PHP, curl is mainly crawling data, of course, we can use other methods to crawl, such as fsockopen,file_get_contents. But you can only catch those pages that can be accessed directly, if you want to crawl pages with page access control, or to log in after the page is more difficult.
The 6 most common examples of Php Curl Classics are:
1, crawling resources under the HTTPS protocol
<?php /*** * php Curl case * Download an HTTPS resource on the network * //Initialize $curlobj = Curl_init (); Set the URL of the Access curl_setopt ($curlobj, Curlopt_url, "https://ajax.aspnetcdn.com/ajax/jquery.validate/1.12.0/ Jquery.validate.js "); The curl_setopt ($curlobj, Curlopt_returntransfer, True) is not printed directly after execution ; Set HTTPS support date_default_timezone_get (' PRC '); When using cookies, you must first set the time zone curl_setopt ($curlobj, Curlopt_ssl_verifypeer, 0); Terminates authentication from the server $output = curl_exec ($curlobj); Execute Get content curl_close ($curlobj); Turn off Curl //Create a file $myfile = fopen (' testfile.html ', ' W ') ; Save the acquired site content to a file fwrite ($myfile, $output); Close file Resource fclose ($myfile);? >
2, crawl no access control files
<?php $ch = Curl_init (); curl_setopt ($ch, Curlopt_url, "http://localhost/mytest/phpinfo.php"); curl_setopt ($ch, Curlopt_header, false); curl_setopt ($ch, Curlopt_returntransfer, 1); If this line is commented out, it will be output directly $result =curl_exec ($ch); Curl_close ($ch); ? >
3, use the agent to crawl
Why use a proxy for crawling? Take Google, for example, if you catch Google's data, catch it very often in a short time, you will not crawl. Google restrictions on your IP address this time, you can change agent re-capture.
<?php $ch = Curl_init (); curl_setopt ($ch, Curlopt_url, "http://blog.51yip.com"); curl_setopt ($ch, Curlopt_header, false); curl_setopt ($ch, Curlopt_returntransfer, 1); curl_setopt ($ch, Curlopt_httpproxytunnel, TRUE); curl_setopt ($ch, Curlopt_proxy, 125.21.23.6:8080); Url_setopt ($ch, curlopt_proxyuserpwd, ' User:password '); If you want a password, add this $result =curl_exec ($ch); Curl_close ($ch); ? >
After 4,post data, fetch the data
Say the data submitted separately, because when using curl, there are many times there will be data interaction, so it is more important.
<?php $ch = Curl_init (); /*** here to note is that the data to be submitted cannot be a two-dimensional array or higher * For example, array (' Name ' =>serialize (' tank ', ' Zhang '), ' sex ' =>1, ' birth ' = > ' 20101010 ') * such as array (' Name ' =>array (' tank ', ' Zhang '), ' sex ' =>1, ' birth ' = ' 20101010 ') This will be an error */ $data = Array (' name ' = ' = ' test ', ' sex ' =>1, ' birth ' = ' 20101010 '); curl_setopt ($ch, Curlopt_url, ' http://localhost/mytest/curl/upload.php '); curl_setopt ($ch, Curlopt_post, 1); curl_setopt ($ch, Curlopt_postfields, $data); Curl_exec ($ch); ? > in the upload.php file, Print_r ($_post); Use curl to scratch out the contents of the upload.php output array ([name] + test [sex] = 1 [Birth] = > 20101010)
5, crawl some pages with page access control
Previously wrote an article, with PHP control page access to 3 ways to be interested can see.
If you use the method mentioned above, the following error will be reported
You is not a authorized to the view this page
You don't have a permission to view this directory or page using the credentials so you supplied because your Web browser is sending a www-authenticate header field, the WEB server is isn't configured to accept.
This time, we're going to use CURLOPT_USERPWD to verify.
<?php $ch = Curl_init (); curl_setopt ($ch, Curlopt_url, "Http://club-china"); /*curlopt_userpwd is mainly used to crack the page access control * For example, we so htpasswd generated page control. * ///curl_setopt ($ch, curlopt_userpwd, ' 231144:2091xtajmd= '); curl_setopt ($ch, Curlopt_httpget, 1); curl_setopt ($ch, Curlopt_referer, "Http://club-china"); curl_setopt ($ch, Curlopt_header, 0); $result =curl_exec ($ch); Curl_close ($ch); ? >
6, Analog login to Sina
We're going to crawl the data, maybe after the login, and this time we're going to use the emulation login feature of Curl.
<?php function Checklogin ($user, $password) {if (Emptyempty ($user) | | emptyempty ($PASSWORD)) {return 0; } $ch = Curl_init (); curl_setopt ($ch, Curlopt_referer, "http://mail.sina.com.cn/index.html"); curl_setopt ($ch, Curlopt_header, true); curl_setopt ($ch, Curlopt_returntransfer, true); curl_setopt ($ch, curlopt_useragent, useragent); curl_setopt ($ch, Curlopt_cookiejar, Cookiejar); curl_setopt ($ch, Curlopt_timeout, TIMEOUT); curl_setopt ($ch, Curlopt_url, "http://mail.sina.com.cn/cgi-bin/login.cgi"); curl_setopt ($ch, Curlopt_post, true); curl_setopt ($ch, Curlopt_postfields, "&logintype=uid&u=". UrlEncode ($user). " &psw= ". $password); $contents = curl_exec ($ch); Curl_close ($ch); if (!preg_match ("/location: (. *) \\\\/cgi\\\\/index\\\\.php\\\\?check_time= (. *) \\n/", $contents, $matches)) { return 0; }else{retUrn 1; }} define ("UserAgent", $_server[' http_user_agent ']); Define ("Cookiejar", Tempnam ("/tmp", "Cookie")); Define ("TIMEOUT", 500); Echo checklogin ("zhangying215", "xtaj227"); ?> Open the Cookie file under/tmp to see the # Netscape HTTP Cookie file# http://curl.haxx.se/rfc/cookie_spec.html# This file is generated by libcurl! Edit at your own risk.mail.sina.com.cn false/false 0 sinamail-webface-sessid 65223c4bd8900284ed463d2a3 e1ac182#httponly_.sina.com.cn True/false 0 SUE es%3d8d96db0820c6c79922ad57d422f575e8%26ev%3dv0%26es2%3 Dcddfb8400dc5ca95902367ddcd7f57dd.sina.com.cn True/false 0 SUP cv%3d1%26bt%3d1286900433%26et%3d1286986 833%26lt%3d1%26uid%3d1445632344%26user%3d%25e5%25bc%25a0%25e6%2598%25a02001%26ag%3d2%26name%3dzhangying20015% 2540sina.com%26nick%3d%25e5%25bc%25a0%25e6%2598%25a02001%26sex%3d1%26ps%3d0%26email%3dzhangying20015% 2540sina.com%26dob%3d1982-07-18#httponly_.sina.com.cn TRUE /FALSE 0 SID BIHCALLOMXMX-QZXZGROLCSQX%2F0B%2F0CMR.NYQ%2F0B%2FCMGGALMARLMCHRCGLSMRMXMFXAL_CBZ%2F_AFUGCMMGI rbyhm0bc%40fr5cizigg5i#httponly_.sina.com.cn True/false 0 sprial bfb4102951fd5892a3fd5b42d442cd26#http Only_.sina.com.cn True/false 0 Sina_user%d5%c5%d2001