PHP analog send POST request five Curl basic use and multithreading optimization, the five Curl
Today, we introduce the use of the Heavy weapons--curl function library and its multi-threading optimization method that PHP simulates sending post requests.
Talking about the Curl function, it is a cliché, but a lot of information on the Internet in the key part of the vague, listing a lot of manuals on the things, make me very painful to get started, I turned some information, combined with their own notes, summed up this blog, hoping to give the first contact Curl developers to provide some help.
Basic steps to use Curl
Let's start by introducing Curl:
Curl according to HTTP header information to simulate the browser transfer data, it supports FTP, FTPS, HTTP, HTTPS, DICT, file and other protocols, with HTTPS authentication, HTTP POST method, HTTP put method, FTP upload, HTTP upload, proxy server, Cookies, user name/password authentication and other functions. Curl is a powerful tool for crawling Web pages, post data, and more.
Using the Curl function is divided into four main parts:
1. Initialize Curl.
2. Set the curl variable, which is the core of Crul, and this step is all that is extended.
3. Perform curl to get the results.
4. Close the connection and recycle the resources.
$ch = Curl_init (); // 1 curl_setopt ($ch, Curlopt_url, "http://localhost"); // 2 $output = curl_exec ($ch); // 3 curl_close ($ch); // 4
In addition, we can use the Curl_getinfo ($ch) function to get information about curl execution, which results in an array
The contents of the $info array include the following:
- "url"//Resource Network Address
- "Content_Type"//Content encoding
- "Http_code"//http status code
- "FILETIME"//File creation time
- "Total_time"//Total time-consuming
- "Size_upload"//size of uploaded data
- Size of "size_download"//Download Data
- "Speed_download"//download speed
- "Speed_upload"//upload speed
- "Download_content_length"//length of downloaded content
- "Upload_content_length"//Length of uploaded content
Common settings for Curl
Here is a detailed description of the variable settings used by Curl when using the second step, which can be set according to various requirements when using the Curl function.
Set BASIC information:
curl_setopt ($ch, Curlopt_url, $string);//Set the directory address for Curl
curl_setopt ($ch, Curlopt_port, $port);//Set the connection port, generally do not set the default 80
curl_setopt ($ch, Curlopt_returntransfer, 1);//Returns the result stream, not output it for subsequent processing, usually set this item, and later processing the captured information, rather than directly output.
Set Post data information:
curl_setopt ($ch, Curlopt_post, 1);//Set transfer data in the form of POST
curl_setopt ($ch, Curlopt_postfields, $string);//Set the data to be transferred
To set up validation information:
curl_setopt ($ch, Curlopt_cookie, $string);//Set the COOKIE information that is carried when Curl executes
curl_setopt ($ch, curlopt_useragent, $string);//Set the browser information for Curl emulation
curl_setopt ($ch, Curlopt_referer, $string);//Set the REFERER in the header to help break the anti-theft chain
curl_setopt ($ch, Curlopt_userpwd, $string);//pass in a connection the user name and password required in the format: "[Username]:[password]"
curl_setopt ($ch, curlopt_followlocation, 1);//set Allow server redirection
To set hardening information:
curl_setopt ($ch, curlopt_nobody, 1);//settings do not allow the output of the HTML body, and setting this option will greatly speed up when fetching information such as page headings
curl_setopt ($ch, Curlopt_timeout, $int);//sets the maximum number of seconds allowed to execute (time-out), and when set to a small value, curl discards pages that have been executing for a long time
curl_setopt ($ch, Curlopt_header, 1);//settings Allow header header files that are generated when the target is read to be included in the output stream
Basic use of the Curl batch function
Of course curl does more than that, and in the manual you can find more variable settings for it. And Curl is the most powerful place in its batch processing function.
The batch of curl seems to be well understood, and here are the general steps:
1. $MH = Curl_multi_init ();//Initializes a batch handle.
2.curl_multi_add_handle ($MH, $ch); Add a set $ch handle to the batch handle.
3.curl_multi_exec ($MH, $running);//executes $MH handle and writes the run state of the $MH handle to the $running variable
4. Loop executes Curl_multi_close () function when $running is true
5. Loop through the $MH handle and get the return value of the first handle with curl_multi_getcontent ()
6. Use Curl_multi_remove_handle () to remove the handle from the $MH
7. Close the $MH batch handle with Curl_multi_close ().
The code is as follows:
Php$CHARR=[]; for($i= 0;$i<50;$i++) {$CHARR[$i]=curl_init ("http://www.baidu.com"); curl_setopt ($CHARR[$i],curlopt_returntransfer,1); } $MH= Curl_multi_init ();//1 foreach($CHARR as $k=$ch) {Curl_multi_add_handle ($MH,$ch);//2}$running=NULL; Do{curl_multi_exec ($MH,$running);//3 } while($running> 0);//4 foreach($CHARR as $k=$ch){ $result[$k]= Curl_multi_getcontent ($ch);//5Curl_multi_remove_handle ($MH,$ch);//6} curl_multi_close ($MH);//7?>
Excessive memory consumption during Curl batch processing
However, when executing a large number of handles, we will find a very serious problem, that is, the system CPU utilization is almost 100%, almost in the freezing state. The reason for this is that the curl_multi_exec () function is not kept executing when the curl_multi_exec ($MH, $running) is executed and the entire batch handle is not fully executed at $running>0. We use experiments to prove that:
We add an echo "a" to the Curl_multi_exec ($MH, $running) sentence in the loop. Our goal is to perform 50 visits to Baidu and then look at the results.
The size of the scroll bar (the scroll bar is already the smallest) can be seen in the number of output a, 500, so we can find the culprit that consumes the CPU.
Memory optimization scheme for Curl batch processing
The way to make changes is to apply the Curl_multi_select () function in the Curl library, whose function prototype is as follows:
int Curl_multi_select (Resource $mh [, Float $timeout = 1.0])
Blocks until there are active connections in the Curl batch connection. On success, returns the number of descriptors in the Descriptor collection. On Failure, select returns 1 on Failure, otherwise a timeout (called from the underlying select system) is returned.
I use the Curl_multi_select () function to block a program that does not need to be read.
We optimized the 3rd and 4 steps of the batch process, using its multi-threading to simulate concurrent programs.
Many friends will be puzzled by the code provided in the manual (I am also at the beginning), the following code and explanation.
$running=NULL; Do { $MRC= Curl_multi_exec ($MH,$running);} while($MRC==curlm_call_multi_perform);//This loop handles the $ch handle in the $MH batch for the first time and writes the execution state of the $MH batch to $running, when the state value equals Curlm_call_multi_perform, indicating that the data is still being written or read, and that the loop is executed when the first $ When the data of the CH handle is written or read successfully, the state value becomes CURLM_OK, jumping out of this loop and entering into the cycle below. The $running is true, that is, the $ch handle in the $MH batch is pending, $MRC ==curlm_ok, that is, the last time the read or write of the $ch handle has been executed. while($running&&$MRC==CURLM_OK) { if(Curl_multi_select ($MH)! =-1) {//$MH Batch also has an executable $ch handle, Curl_multi_select ($MH)! =-1 The program exits the blocking state. Do{//continue execution of the $ch handle that needs to be handled. $MRC= Curl_multi_exec ($MH,$running); } while($MRC==curlm_call_multi_perform); }}
The benefit of this is that the $ch handle in the $MH batch will go into the blocking phase of the Curl_multi_select ($MH) after the data has been read or written ($MRC ==CURLM_OK), and will not be continuously executed while the $MH batch is executing curl_multi _exec, wasting CPU resources in vain.
Memory optimization results for Curl batch processing
The complete code is as follows:
Php$CHARR=[]; for($i= 0;$i<50;$i++){ $CHARR[$i]=curl_init ("http://www.baidu.com"); curl_setopt ($CHARR[$i],curlopt_returntransfer,1); } $MH=Curl_multi_init (); foreach($CHARR as $k=$ch) Curl_multi_add_handle ($MH,$ch);
$running=NULL; Do {
$MRC= Curl_multi_exec ($MH,$running); } while($MRC==curlm_call_multi_perform); while($running&&$MRC==CURLM_OK) {if(Curl_multi_select ($MH)! =-1) { Do{$MRC= Curl_multi_exec ($MH,$running); } while($MRC==curlm_call_multi_perform); }}foreach($CHARR as $k=$ch){ $result[$k]= Curl_multi_getcontent ($ch); Curl_multi_remove_handle ($MH,$ch); } curl_multi_close ($MH); ?>
Once again we add echo "a" before $MRC = Curl_multi_exec ($mh, $running) sentence;
More than 50 times, however, CPU utilization has improved significantly before it has been optimized.
Although the Curl function is very powerful, we still have the opportunity to use other functions to send post requests, as well as to understand the curl function from a lower level, so this series is also used in large chunks on other functions.
OK, the end of this series, while writing this blog post, I also learned a lot. If you feel that this blog post is helpful to you, please click to recommend or follow me, we continue to share my notes summary. If there is any problem, you can leave a comment below, thank you for reading.
http://www.bkjia.com/PHPjc/1067637.html www.bkjia.com true http://www.bkjia.com/PHPjc/1067637.html techarticle PHP analog send POST request of the five Curl basic use and multithreading optimization, the five curl today to introduce PHP analog send POST request heavy weapons Curl function library use and its multi-threaded optimizer ...