PHP simulates the use of the five curls of POST requests and optimizes multithreading. PHP simulates the use of the 5th curl of POST requests and optimizes multithreading, 5th curl: today we will introduce the use of PHP's function library for simulating the POST request's heavy-weapon cURL library and its multi-threaded Optimizer PHP's function library for simulating the use of the 5th curl for POST request and multi-thread optimization. 5curl
Today, we will introduce the use of the cURL function library and its multi-threaded optimization method, a heavy weapon used by PHP to simulate POST requests.
Speaking of the cURL function, it is a commonplace. However, many documents on the Internet are vague about the key part, listing a lot of things in the manual makes me very painful to get started. I have reviewed some materials, I summarized this blog with my own notes, hoping to provide some help to developers who are new to cURL.
Procedure
First introduce cURL:
CURL simulates the transmission of data by the browser based on the HTTP header information. it supports FTP, FTPS, HTTP, HTTPS, DICT, FILE, and other protocols. it has HTTPS authentication, http post, and http put, FTP upload, HTTP upload, proxy server, cookies, user name/password authentication, and other functions. CURL is a powerful tool for crawling websites to Capture webpages and POST data.
The cURL function is mainly divided into four parts:
1. initialize cURL.
2. set the cURL variable, which is the core of cRUL. the extension function depends on this step.
3. execute cURL to obtain the result.
4. close the connection and recycle resources.
$ch = curl_init();//1curl_setopt($ch, CURLOPT_URL, "http://localhost");//2$output = curl_exec($ch);//3curl_close($ch);//4
In addition, you can use the curl_getinfo ($ ch) function to obtain information about curl execution. The result is an array.
The $ info array includes the following content:
- "Url" // resource network address
- "Content_type" // content encoding
- "Http_code" // HTTP status code
- "Filetime" // file creation time
- "Total_time" // total time consumed
- "Size_upload" // size of the uploaded data
- "Size_download" // size of the downloaded data
- "Speed_download" // download speed
- "Speed_upload" // upload speed
- "Download_content_length" // The length of the downloaded content
- "Upload_content_length" // length of the uploaded content
Common cURL settings
The following describes the common variable settings for curl when using the second step. when using the curl function, you can set the variables as needed.
Set basic information:
Curl_setopt ($ ch, CURLOPT_URL, $ string); // you can specify the curl directory address.
Curl_setopt ($ ch, CURLOPT_PORT, $ port); // Set the connection port. Generally, the default value is 80.
Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1); // The Returned result stream is not output for future processing. this item is usually set and the captured information will be processed later, instead of output directly.
Set POST data information:
Curl_setopt ($ ch, CURLOPT_POST, 1); // you can specify POST as the data transmission method.
Curl_setopt ($ ch, CURLOPT_POSTFIELDS, $ string); // you can specify the data to be transmitted.
Set verification information:
Curl_setopt ($ ch, CURLOPT_COOKIE, $ string); // sets the Cookie information carried during curl execution.
Curl_setopt ($ ch, CURLOPT_USERAGENT, $ string); // sets the browser information simulated by curl.
Curl_setopt ($ ch, CURLOPT_REFERER, $ string); // sets the referer in the header to facilitate anti-Leech cracking
Curl_setopt ($ ch, CURLOPT_USERPWD, $ string); // pass the username and password required for a connection. Format: "[username]: [password]"
Curl_setopt ($ ch, CURLOPT_FOLLOWLOCATION, 1); // set to allow server redirection
Set enhancement information:
Curl_setopt ($ ch, CURLOPT_NOBODY, 1); // The setting does not allow the output of the HTML body. setting this option greatly speeds up page title capture.
Curl_setopt ($ ch, CURLOPT_TIMEOUT, $ int); // sets the maximum number of seconds allowed for execution (timeout). when the value is set to a small value, CURL will discard the page with a long execution time.
Curl_setopt ($ ch, CURLOPT_HEADER, 1); // you can specify whether to include the header file generated when reading the target in the output stream.
Basic use of the cURL batch processing function
Of course, the cURL function does not stop here. you can find more variable settings in the manual. In addition, the most powerful part of cURL is its batch processing function.
The cURL batch processing seems to be easy to understand. The following are general steps:
1. $ mh = curl_multi_init (); // initialize a batch handle.
2. curl_multi_add_handle ($ mh, $ ch); // add the set $ ch handle to the batch handle.
3. curl_multi_exec ($ mh, $ running); // execute the $ mh handle and write the running status of the $ mh handle to the $ running variable.
4. when $ running is true, the curl_multi_close () function is executed cyclically.
5. after the loop ends, traverse the $ mh handle and use curl_multi_getcontent () to obtain the return value of the first handle.
6. use curl_multi_remove_handle () to remove the handle from $ mh.
7. use curl_multi_close () to close the $ mh batch processing handle.
The code is as follows:
$ch){ curl_multi_add_handle($mh,$ch); //2 } $running = null; do{ curl_multi_exec($mh,$running); //3 }while($running > 0); //4 foreach($chArr as $k => $ch){ $result[$k]= curl_multi_getcontent($ch); //5 curl_multi_remove_handle($mh,$ch);//6 } curl_multi_close($mh); //7 ?>
High Memory usage during cURL batch processing
However, when we execute a large number of handles, we will find a very serious problem, that is, the CPU usage of the system is almost 100% during execution, and it is almost dead. The reason is that when you execute curl_multi_exec ($ mh, $ running) at $ running> 0 and the entire batch processing handle is not fully executed, the system will continuously execute curl_multi_exec () function. We use experiments to prove that:
We add an echo "a"; statement before the curl_multi_exec ($ mh, $ running) sentence in the loop. Our goal is to perform 50 visits to Baidu and then let's take a look at the results.
The size of the scroll bar (the scroll bar is already in the smallest state) shows the number of output a, and there are more than 500, so we can find the culprit of CPU usage.
Memory optimization scheme for cURL batch processing
The method of modification is to apply the curl_multi_select () function in the curl function library. the Function prototype is as follows:
Int curl_multi_select (resource $ mh [, float $ timeout = 1.0])
Blocking until there is an active connection in the cURL batch processing connection. Returns the number of descriptors in the descriptor set when the set is successful. If the select statement fails,-1 is returned. Otherwise, the return time-out (called from the underlying select system) is returned ).
We use the curl_multi_select () function to block programs that do not need to be read.
We optimized 3rd and 4 steps of batch processing and used its multithreading to simulate concurrent programs.
Many may have doubts about the code provided in the manual (I started from the beginning). the following code and explanations are provided.
$ Running = null; do {$ mrc = curl_multi_exec ($ mh, $ running);} while ($ mrc = CURLM_CALL_MULTI_PERFORM ); // This cycle processes the $ ch handle in $ mh batch processing for the first time, and writes the execution status of $ mh batch processing to $ running. when the status value is equal to CURLM_CALL_MULTI_PERFORM, it indicates that the data is still being written or read, and the execution cycle is executed. when the data of the first $ ch handle is successfully written or read, the status value changes to CURLM_ OK, and jumps out of this loop and enters the following large loop. // $ Running is true, that is, $ ch handle is waiting for processing in the $ mh batch, $ mrc = CURLM_ OK, that is, the last read or write of $ ch handle has been completed. While ($ running & $ mrc = CURLM_ OK) {if (curl_multi_select ($ mh )! =-1) {// $ the $ ch handle that can be executed in the mh batch processing, curl_multi_select ($ mh )! =-1 the program exits the blocking state. Do {// continue the $ ch handle to be processed. $ Mrc = curl_multi_exec ($ mh, $ running);} while ($ mrc = CURLM_CALL_MULTI_PERFORM );}}
The advantage of this execution is that the $ ch handle in the $ mh batch will enter the blocking phase of curl_multi_select ($ mh) after the data is read or written ($ mrc = CURLM_ OK, instead of running curl_multi_exec in the whole $ mh batch, the CPU resources are wasted.
Memory optimization results of cURL batch processing
The complete code is as follows:
$ch) curl_multi_add_handle($mh,$ch);
$running = null; do {
$mrc = curl_multi_exec($mh, $running); } while ($mrc == CURLM_CALL_MULTI_PERFORM); while ($running && $mrc == CURLM_OK) { if (curl_multi_select($mh) != -1) { do { $mrc = curl_multi_exec($mh, $running); } while ($mrc == CURLM_CALL_MULTI_PERFORM); } } foreach($chArr as $k => $ch){ $result[$k]= curl_multi_getcontent($ch); curl_multi_remove_handle($mh,$ch); } curl_multi_close($mh); ?>
We add echo "a" before the $ mrc = curl_multi_exec ($ mh, $ running) sentence again. The result is as follows:
Although there are more than 50 times, the CPU usage has greatly improved before the optimization.
Although the curl function is very powerful, we still have the opportunity to use other functions to send POST requests. In addition, we can also learn about the curl function at a lower level, therefore, this series also takes a lot of space on other functions.
OK. this is the end of this series. I have learned a lot while writing this series of blog posts. If you think this blog post is helpful to you, please click to recommend or follow me and we will continue to share my note summary. If you have any questions, you can leave a message below to discuss them. thank you for reading.
Today, the author introduced the use of the cURL function library for PHP to simulate POST requests and its multi-threaded Optimizer...