How curl is used in PHP

Source: Internet
Author: User
Tags urlencode ssl certificate
This article mainly for you in detail the PHP collection artifact Curl use method, file_get_contents function to obtain remote link data more advantages, interested in small partners can refer to

For those who have done data collection, curl must not be unfamiliar. Although there is file_get_contents function in PHP can get remote link data, but it is too poor control, for a variety of complex situation acquisition scenarios, File_get_contents seems a little powerless. Therefore, this article will introduce you to the use of the acquisition artifact Curl.

First of all, let's add the file_get_contents function to get the remote link data method.

<?php$url = "Http://git.oschina.net/yunluo/API/raw/master/notice.txt"; $ch = Curl_init (); curl_setopt ($ch, curlopt _url, $url); curl_setopt ($ch, Curlopt_returntransfer, 1); curl_setopt ($ch, Curlopt_connecttimeout, ten); $notice = Curl_ EXEC ($ch); Echo $notice;? >

This code will directly use Curl to display the file content, but the problem comes, because Curl is the extension of PHP, and some hosts in order to secure the security will Jin Yong Curl, better than when the PHP local debugging is also off curl, so there will be an error, so this code is not advisable, So the cloud fell on him and re-rewritten.

<?php  if (function_exists (' Curl_init ')) {    $url = "http://git.oschina.net/yunluo/API/raw/master/ Notice.txt ";    $ch = Curl_init ();    curl_setopt ($ch, Curlopt_url, $url);    curl_setopt ($ch, Curlopt_returntransfer, 1);    curl_setopt ($ch, Curlopt_connecttimeout, ten);    $dxycontent = curl_exec ($ch);    echo $dxycontent;  } else {    echo ' Khan! It seems that your server has not turned on the curl extension, can not receive notifications from the cloud, please contact your host to open, local debugging Please ignore ';  }? >

The modified version is to make a judgment curl extension, to see if the server has a wood to open the curl extension, if opened, directly display the file, if not open, display a paragraph of hint text.
Although fixed the problem, but there is a problem, I just show a piece of text, I am not what to do anything big, so why should I write so much code??
After some nonsense detection, file_get_contents to get remote file content is not slower than curl, and in some cases it may be much faster than curl extension, so I rewrite the code

<?php Echo file_get_contents ("Http://git.oschina.net/yunluo/API/raw/master/notice.txt");?>

Tools
Firefox (Firefox) + Firebug
"工欲善其事, its prerequisite. "Before we analyze the case, let's learn how to use artifact Firebug to get the information we need."
Using F12 to open firebug, we can get (a) interface:

1, the arrow icon is the element selection tool, click to highlight the icon, while the mouse movement within the page will also be selected in the HTML menu content, click on the content indicates that the element is selected, the icon highlighting is canceled. (ii) as shown:
Firebug viewing elements

2. Control Station
JS inside the Console.log series function of printing is here output.
3. HTML
HTML content, notice here is not necessarily the acquisition to parse the content, collection time on the content of the analysis, all to view the source (Ctrl+u), here is just the structure of the rapid positioning elements, and then select a more special reference, in the source code to locate the corresponding location.
For example, you see a tag in the HTML is <p id= "demo" class= "Demo" >demo</p>, but you see the source of content may be <p class= "Demo" id= "Demo" > Demo</p>, if you do a regular match for the collection, you won't get the result.
4. CSS
Here is the content of the CSS file
5. Scripts
Here is the JavaScript file content
6. DOM
DOM node content
7. Network
Each request linked data, here is our collection to focus on and analysis of the place, it can display each request parameters, request header, cookie data and so on. In the case where the page submission refreshes, it is necessary to use hold, so that the page request content remains in the console after the refresh, as shown in (c):


In addition, Firefox also has a Tamper data extension can also be requested, if necessary, can be installed to use.
8. Cookies
Cookie data

In the figure (i) also see that there are many optional side items, which remain our concern, when it is selected, even if the submission form refreshes the page, the following content area of the data will be retained, which is particularly critical for analyzing the submission data.

Summarize
When we analyze the acquisition request, we mainly care about the request data in the "Network" menu, and if necessary, use "hold" to view the request data of the refreshed page, before the request can use "clear" to clear the following content.

Case analysis
First, simple collection
The simple acquisition referred to here refers to the acquisition of a single page get request, which is simple enough to get the results of the page return even through the file_get_contents function.

File_get_contents of code Snippets

<?php  $url = ' http://demo.zjmainstay.cn/php/curl/simple.html ';  $content = file_get_contents ($url);  Echo $content;

Curl of code Snippet

<?php  $url = ' http://demo.zjmainstay.cn/php/curl/simple.html ';  $ch = Curl_init ($url);  curl_setopt ($ch, Curlopt_returntransfer, 1); Return data not directly output  $content = curl_exec ($ch);          Executes and stores the result  curl_close ($ch);    Echo $content;

Second, the need for parameter acquisition
In this case, the page request needs to pass in some parameters, either a GET request or a POST request. The acquisition of this situation, using file_get_contents to take some parameters can still be achieved, but here we will no longer show.

Code Snippet of Curl GET
This request, we can choose search engine as a demonstration, for example, I Baidu search a word "PHP CURL", after entering enter, we will get a similar http://www.baidu.com/s?ie=utf-8&f=8&rsv_ Bp=1&ch=&tn=baidu&bar=&wd=php%20curl Links, note that the links here may vary from browser to portal, so you don't have to mind if the link is the same. By entering multiple keywords and observing the links, we can make sure that the WD parameter is the dynamic parameter that we want to pass in, while the other parameters can be the same, so we get the following collection code.

<?php  $keyword  = ' php CURL ';  $url    = ' http://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&ch=&tn=baidu&bar=&wd= '. UrlEncode ($keyword);    $ch = Curl_init ($url);  curl_setopt ($ch, Curlopt_returntransfer, 1); Return data not directly output  $content = curl_exec ($ch);          Executes and stores the result  curl_close ($ch);    Echo $content;

Sometimes, some parameters are not necessary, we can delete it at this time, such as the above link can only keep http://www.baidu.com/s?ie=utf-8&wd=PHP%20cURL,ie=utf-8 this parameter may affect the result of the encoding , so keep it for the moment. In this simple code, we can collect the results of Baidu search.

Code Snippet of Curl POST
For post type requests, we are not uncommon, for example, some search is submitted using post, we need to use the post type to submit parameters. This has the corresponding parameters in PHP Curl: Curlopt_post and Curlopt_postfields, curlopt_post settings can specify whether the current commit is POST mode, Curlopt_ The postfields is used to set the arguments for a commit, either as a parameter string or as a parameter array, such as:

curl_setopt ($ch, Curlopt_postfields, ' Ie=utf-8&wd=php%20curl '), or curl_setopt ($ch, Curlopt_postfields, Array (  ' ie '  = ' utf-8 ',  ' wd '  = ' php%20curl ',);

Here is a post simulation I do search PHP post search, backend is the use of the previous Baidu keyword search, the basic principle is that the client to submit a keyword to my server, my server use the keyword request Baidu search, and then get the results, return to the client.
(d) Use Firebug to analyze the request data and get the request link and request parameters we need to submit:

And here's our code:

<?php  $keyword  = ' php CURL ';  Parameter method one  //$post    = ' wd= '. UrlEncode ($keyword);    Parameter method two  $post    = Array (    ' wd ' =    urlencode ($keyword),  );  $url    = ' http://demo.zjmainstay.cn/php/curl/search.php ';    $ch = Curl_init ($url);  curl_setopt ($ch, Curlopt_returntransfer, 1); Return data not directly output  curl_setopt ($ch, Curlopt_post, 1);      Send post type Data  curl_setopt ($ch, Curlopt_postfields, $post);//post data, $post can be an array, or it can be stitching  $content = curl_exec ($ch);          Executes and stores the result  curl_close ($ch);    Var_dump ($content);

Third, the need for Referer collection
For some programs, it may judge the source URL, if found Referer is not their own site, then deny access, at this time, we need to add curlopt_referer parameters, simulation of the route, so that the program can be collected normally.

<?php  $keyword  = ' php CURL ';  Parameter method one  //$post    = ' wd= '. UrlEncode ($keyword);    Parameter method two  $post    = Array (    ' wd ' =    urlencode ($keyword),  );  $url    = ' http://demo.zjmainstay.cn/php/curl/search_refer.php ';  $refer   = ' http://demo.zjmainstay.cn/';  Route address    $ch = Curl_init ($url);  curl_setopt ($ch, Curlopt_returntransfer, 1); Return data not directly output  curl_setopt ($ch, Curlopt_referer, $refer);  Antecedents of Analog  curl_setopt ($ch, Curlopt_post, 1);      Send post type Data  curl_setopt ($ch, Curlopt_postfields, $post);//post data, $post can be an array, or it can be stitching  $content = curl_exec ($ch);          Executes and stores the result  curl_close ($ch);    Var_dump ($content);

search_refer.php the source code as follows, did a simple referer judgment intercept:

<?php  if (Empty ($_post[' wd ')) {    exit (' Deny empty params. ');  }    REFERER determine  if (Stripos ($_server[' http_referer '), $_server[' http_host ']) = = = False) {    exit (' Deny ');  }    $keyword  = addslashes (Trim (strip_tags ($_post[' wd)));  $url    = ' http://www.baidu.com/s?ie=utf-8&wd= '. UrlEncode ($keyword);    $ch = Curl_init ($url);  curl_setopt ($ch, Curlopt_returntransfer, 1); Return data not directly output  $content = curl_exec ($ch);          Executes and stores the result  curl_close ($ch);    Echo $content;

Iv. acquisition required for cookie support
For analog login applications, simply submitting the parameters and simulating the routing does not solve the problem, we need to save or submit the corresponding cookie parameters, which also provides the corresponding parameters in PHP Curl:
Curlopt_cookie: Submit COOKIE parameters directly using string method
Curlopt_cookiefile: Submit cookie Parameters using file method
Curlopt_cookiejar: Save cookie data for post-submission feedback

The following is an example of PHP100 's demo login:

<?php header ("content-type:text/html; Charset=utf-8 "); $cookie _file = Tempnam ('./temp ', ' Cookie '); $login _url=" http://bbs.php100.com/login.php "; $post _ Fields= "Cktime=36000&step=2&pwuser=username&pwpwd=password";//Submit Login Form Request $ch=curl_init ($login _url); curl_setopt ($ch, curlopt_header,0); curl_setopt ($ch, curlopt_returntransfer,1); curl_setopt ($ch, curlopt_post,1); curl_setopt ($ch, Curlopt_postfields, $post _fields); curl_setopt ($ch, Curlopt_cookiejar, $cookie _file); Store the cookie data received after submission curl_exec ($ch); Curl_close ($ch);//Login successful, get BBS home data $url= "http://bbs.php100.com/index.php"; $ch Curl_init ($url); curl_setopt ($ch, curlopt_header,0); curl_setopt ($ch, curlopt_returntransfer,1); curl_setopt ($ch, Curlopt_cookiefile, $cookie _file); Use the cookie data obtained after submission for parameter $contents=curl_exec ($ch); Curl_close ($ch);//transcoding to show Echo iconv (' GBK ', ' UTF-8 ', $contents);

V. Compressed Web Capture (gzip)
Some friends who have not touched the compressed page will be here to die, because they will find that the content collected back is garbled, and no matter the use of iconv or powerful mb_convert_encoding can not restore data, and then no concept, a variety of crazy but can't find the method, Haha, That's how I used to be
(v) is a garbled form of expression:


Fortunately the last kung fu, or found, it is curlopt_encoding parameters.
For example, the acquisition of Sohu News when the gzip compression problem encountered, the following is an example:

<?php  $url = ' http://news.sohu.com/';    $ch = Curl_init ($url);  curl_setopt ($ch, Curlopt_returntransfer, 1); Return data not directly output  curl_setopt ($ch, curlopt_encoding, "gzip");//Specify gzip compression  $content = curl_exec ($ch);//execute and store results  Curl_close ($ch);  Echo $content;

Manual Description: The supported encodings are "identity", "deflate" and "gzip". If an empty string "", the request header sends all supported encoding types.
The following sentence indicates that the use of curl_setopt ($ch, Curlopt_encoding, "") is also possible, but this parameter cannot be added.

Vi. Collection of SSL links
Some request links are https type, this time using curl acquisition may fail, at this time, we can use Var_dump (Curl_error ($ch)), the method to print the error prompt, and then based on the error to find the appropriate solution. For example SSL error common hint: SSL certificate problem:unable to get local issuer certificate, this time, we need to take advantage of parameters: Curlopt_ssl_verifypeer and Curlopt_ssl_verifyhost to disable authentication for SSL certificates, I tried to disable only using the Curlopt_ssl_verifypeer parameter, so it is best to use two parameters at the same time.
The following is a code example:

<?php  $searchStr = ' rc376981638hk ';  $post  = ' accion=localizauno&numero= '. $searchStr. ' &ecorreo=&numeros= ';  $url  = ' https://aplicacionesweb.correos.es/localizadorenvios/track.asp ';  $ch         = Curl_init ($url);       Initialize Curl  curl_setopt ($ch, Curlopt_returntransfer, 1);    Return data not directly output  curl_setopt ($ch, Curlopt_post, 1);         Send post type Data  curl_setopt ($ch, Curlopt_postfields, $post);    Post data, $post can be an array, or it can be a concatenation parameter string  curl_setopt ($ch, Curlopt_ssl_verifypeer, false);  SSL error when using  curl_setopt ($ch, Curlopt_ssl_verifyhost, false);  SSL error when using  $contents = curl_exec ($ch);               Execute and Store results  //Var_dump (Curl_error ($ch));            Get failure is used (acquisition error hint)  curl_close ($ch);  Echo $contents;

Seven, Agent Collection
As we all know, there are evil walls at home, so if we need to get some of the wall data, we need to use foreign Proxy server, or we need to collect a lot of data, we need to constantly switch IP, also use proxy.
There are several corresponding parameters in PHP curl using a proxy: Curlopt_proxy, Curlopt_proxyport, and Curlopt_proxyuserpwd, and several others, not listed here.
Curlopt_proxy Specifying proxy IP parameters
Curlopt_proxyport Specifying proxy port parameters
CURLOPT_PROXYUSERPWD Specifies the account password of the agent that needs to be validated, the "[Username]:[password]" format of the string

About proxy account acquisition, everyone to play, I am here to provide a list of online search: CURL High Stealth agent

The following is an example of proxy acquisition:

<?php  $url = ' http://demo.zjmainstay.cn/php/curl/dump_ip.php?t= '. Time ();    echo "Local IP:". File_get_contents ($url). "\ NAND fake IP:";    $ip   = ' 183.224.1.116 ';  $port  = ' n ';    To forge the request header parameter, if it is a high stealth proxy you do not need to provide  $header = Array (    ' x-forwarded-for: '. $ip,    ' Client-ip: '. $ip,  );    $ch         = Curl_init ($url);//Initialize Curl  curl_setopt ($ch, Curlopt_returntransfer, 1);  curl_setopt ($ch, Curlopt_httpheader, $header);  curl_setopt ($ch, Curlopt_proxy, $IP);  curl_setopt ($ch, Curlopt_proxyport, $port);  $content = curl_exec ($ch); Executes and stores the result  curl_close ($ch);    Echo $content;

Eight, multi-threaded collection
For a lot of acquisition work, in order to improve the acquisition efficiency, using PHP curl provides multi-threaded acquisition is essential. The manual is provided in the multi-threaded collection example does not seem to be very good, I have just started to test a few examples, but found that the execution of the card is dead, can not be completed, a few days ago suddenly tested a bit, and then found Curl_multi_info_read function below the example # 1 is can be executed, its content on the $res, but did not print out, and Yahoo's request is relatively slow, will be stuck, the front two links can return normally.
However, fortunately the example was not good, then I went through a search to find a very strong project, Curlmulti, it is a benign extension of PHP CURL Multi package, to provide a good collection support.
About the use of Curlmulti I do not introduce more, the official web site provides a demo, the use of technical difficulties can be directly added to the Q group discussion, the author @ares and other acquisition of Daniel will provide technical answers to help.
Here is a simple example of PHP CURL multi:

<?php$urls = Array (  "http://demo.zjmainstay.cn/php/curl/curl_multi_1.php",  "http://demo.zjmainstay.cn /php/curl/curl_multi_2.php ","); $mh = Curl_multi_init (); foreach ($urls as $i + = $url) {  $conn [$i] = Curl_init ($url) ;  curl_setopt ($conn [$i], Curlopt_returntransfer, 1); does not directly output  the result Curl_multi_add_handle ($MH, $conn [$i]);} $active = null, $res = Array ();d o {  $status = curl_multi_exec ($MH, $active);  $info = Curl_multi_info_read ($MH);  if (false!== $info) {    //Acquisition Information processing    $res [] = array (      ' content ' = =  curl_multi_getcontent ($info [' Handle ']),      ' info '   = $info,    );    Curl_close ($info [' handle ']);}  } while ($status = = = Curlm_call_multi_perform | | $active); curl_multi_close ($MH); Var_dump ($res);

Nine, 302 jump (301 Jump)
For some applications, such as a simulated login, if a 302 jump occurs, the cookie is lost and the impersonation login fails, as shown in the request phenomenon (vi):

At this time, you can use:

curl_setopt ($ch, curlopt_followlocation, true);

For curlopt_followlocation, the manual description is:

When enabled, the "location:" returned by the server server is returned to the server recursively in the header, using Curlopt_maxredirs to limit the number of recursive returns.
I personally understand that the popular point is that the back of the jump to continue to track access, and the cookie is kept in the header.

Ten, the simulation upload file
In the curl_setopt function of the PHP manual, about curlopt_postfields has the following description:

All data is sent using the "POST" action in the HTTP protocol. To send a file, precede the file name with the @ prefix and use the full path. This parameter can be urlencoded after the string resembles ' para1=val1&para2=val2& ... ' or using a field named as the key value, the field data is an array of values. If value is an array, the Content-type header will be set to Multipart/form-data.

For uploading a file, this sentence contains two messages:

1. To upload a file, the data parameter of the post must use an array so that the Content-type header will be set to Multipart/form-data.
2. To upload a file, precede the filename with the @ prefix and use the full path.
Therefore, the analog file upload can be implemented as follows:

Upload the test.jpg file under D, the file must exist, otherwise the curl processing fails without any hint $data = array (' name ' = = ' Foo ', ' file ' = ' @d:/test.jpg '); $ch = Curl_ Init (' http://localhost/upload.php '); curl_setopt ($ch, Curlopt_post, 1); curl_setopt ($ch, Curlopt_postfields, $data); Curl_exec ($ch);

When testing locally, print out \ \ (_post and \$_files in the upload.php file to verify that the upload was successful, as follows: "<?php Print_r (\) _post);
Print_r ($_files);

The output is similar:

Array ([name] + Foo) array ([file] = = Array ([name] = = test.jpg [Type] = = Application/octet-stream [tmp_n AME] = D:\xampp\tmp\php2EA0.tmp [ERROR] = 0 [size] = 139999))

For the assignment of Curlopt_postfields, add another sentence:
Passing an array to Curlopt_postfields,curl will encode the data into multipart/form-data, while passing a url-encoded string, the data will be encoded as application/ X-www-form-urlencoded.

That

curl_setopt (CH, curlopt_postfields, ' param1=val1&param2=val2& ... '); curl_setopt (\) ch, curlopt_ Postfields, Array (' param1 ' = ' val1 ', ' param2 ' = ' val2 ', ...));

Such a powerful collection artifact curl use method for everyone to introduce this, hope to everyone's learning to help.

The above is the whole content of this article, I hope that everyone's learning has helped, more relevant content please pay attention to topic.alibabacloud.com!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.