PHP multi-thread batch collection and download of beautiful pictures implementation code (continued)

Source: Internet
Author: User

I personally think the cause of the impact:
The url of the matched image is not a valid url. In this article, we simply judge whether it is a relative path, but some URLs are invalid.
Solution: add an image to determine whether it is a real and valid url.

Copy codeThe Code is as follows :/**
*
* Determine whether the url is valid
* @ Param $ url string
* @ Return boole
*/
Function relUrl ($ url ){
If (substr ($ url, 0, 4) = 'HTTP '){
$ Array = get_headers ($ url, true );
If (count ($ array)> 0 & is_array ($ array )){
If (preg_match ('/ 200/', $ array [0]) {
Unset ($ array );
Return true;
} Else {
Unset ($ array );
Return false;
}
} Else {
Unset ($ array );
Return false;
}
} Else {
Return false;
}
}

The get_headers function is used to obtain the http request information and determine the server response status (200) to determine whether the url is true and valid.

Test the image capture function again.
The results are worse than before, and the operation is slower.

The reason for the test is:
Although the get_headers function can determine whether the url is true or not, if there is a very slow url resource, because the get-heades request has no time limit, this thread is occupied, and subsequent requests are blocked
The file_get_content function is the same as the preceding function. Because some slow url resources are used for a long time, the process after blocking is occupied, and the cpu usage increases for a long time.
Solution;
The use of curl multithreading, In addition, curl can set the request time, in the event of a very slow url resource, you can decisively give up, so there is no blocking, and there are multi-threaded requests, the efficiency should be relatively high, refer: CURL learning and application [multithreading]. Let's test it again;

Core code:

Copy codeThe Code is as follows :/**
* Curl Multithreading
*
* @ Param array $ array parallel URL
* @ Param int $ timeout
* @ Return mix
*/
Public function Curl_http ($ array, $ timeout = '15 '){
$ Res = array ();

$ Mh = curl_multi_init (); // create multiple curl handles

Foreach ($ array as $ k => $ url ){
$ Conn [$ k] = curl_init ($ url); // Initialization

Curl_setopt ($ conn [$ k], CURLOPT_TIMEOUT, $ timeout); // set the timeout time
Curl_setopt ($ conn [$ k], CURLOPT_USERAGENT, 'mozilla/5.0 (compatible; MSIE 5.01; Windows NT 5.0 )');
Curl_setopt ($ conn [$ k], CURLOPT_MAXREDIRS, 7); // HTTp Targeting level, Top 7
Curl_setopt ($ conn [$ k], CURLOPT_HEADER, false); // No header here, block Efficiency
Curl_setopt ($ conn [$ k], CURLOPT_FOLLOWLOCATION, 1); // 302 redirect
Curl_setopt ($ conn [$ k], CURLOPT_RETURNTRANSFER, 1); // The result must be a string and output to the screen.
Curl_setopt ($ conn [$ k], CURLOPT_HTTPGET, true );

Curl_multi_add_handle ($ mh, $ conn [$ k]);
}
// Prevent endless loop consumption of cpu. This section is based on the online statement.
Do {
$ Mrc = curl_multi_exec ($ mh, $ active); // when no data exists, active = true
} While ($ mrc = CURLM_CALL_MULTI_PERFORM); // when receiving data
While ($ active and $ mrc = CURLM_ OK) {// when there is no data or the request is paused, active = true
If (curl_multi_select ($ mh )! =-1 ){
Do {
$ Mrc = curl_multi_exec ($ mh, $ active );
} While ($ mrc = CURLM_CALL_MULTI_PERFORM );
}
}

Foreach ($ array as $ k => $ url ){
If (! Curl_errno ($ conn [$ k]) {
$ Data [$ k] = curl_multi_getcontent ($ conn [$ k]); // convert data to array
$ Header [$ k] = curl_getinfo ($ conn [$ k]); // return http header information
Curl_close ($ conn [$ k]); // close the slogan.
Curl_multi_remove_handle ($ mh, $ conn [$ k]); // release resources
} Else {
Unset ($ k, $ url );
}
}

Curl_multi_close ($ mh );

Return $ data;

}

// Receive Parameters
$ Callback = $ _ GET ['callback'];
$ Hrefs = $ _ GET ['hrefs'];
$ Urlarray = explode (',', trim ($ hrefs ,','));
$ Date = date ('ymmd', time ());
// Instantiate
$ Img = new HttpImg ();
$ Stime = $ img-> getMicrotime (); // start time

$ Data = $ img-> Curl_http ($ urlarray, '20'); // List data
Mkdir ('./img/'. $ date, 0777 );
Foreach (array) $ data as $ k => $ v ){
Preg_match_all ("/(href | src) = ([" | ']?) ([^ "'>] +. (Jpg | png | PNG | JPG | gif) \ 2/I", $ v, $ matches [$ k]);

If (count ($ matches [$ k] [3])> 0 ){
$ Dataimg = $ img-> Curl_http ($ matches [$ k] [3], '20'); // binary of all image data
$ J = 0;
Foreach (array) $ dataimg as $ kk => $ vv ){
If ($ vv! = ''){
$ Rand = rand );
$ Basename = time (). "_". $ rand. ".". jpg; // save as a jpg file
$ Fname = './img/'. $ date. "/". "$ basename ";
File_put_contents ($ fname, $ vv );
$ J ++;
Echo "create the". $ j. "picture". "$ fname". "<br/> ";
} Else {
Unset ($ kk, $ vv );
}
}
} Else {
Unset ($ matches );
}
}
$ Etime = $ img-> getMicrotime (); // End Time
Echo "time used". ($ etime-$ stime). "seconds ";
Exit;

Test the effect

It takes about 337 seconds for 260 images to be collected within one second. In addition, the faster the image acquisition speed, the more obvious the image is.

Let's take a look at the file name: 10 images can be generated at the same time,

Due to the 20-second request time limit, some images are obviously incomplete after being generated, that is, the image resources cannot be fully collected within 20 seconds. You can set this time on your own.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.