PHP batch collection and download of beautiful pictures implementation code

Source: Internet
Author: User

Design Concept

Considering that it is too troublesome to collect images from a Web page, it is too troublesome to directly collect the list pages, obtain the list URLs, and then collect them one by one. However, it is too troublesome to match the list page url with php, there are a lot of invalid URLs on the first list page, which is really a problem for me this little rookie regular expression. After reading the structure of the list page, I decided to use jquery to get the url, jquery's omnipotent selector is again powerful.

Jquery gets the url, and then ajax passes the url-> corresponding to the PHP file, traversing the url parameter-> collecting and saving images on a single page

Jquery Program
Copy codeThe Code is as follows:
<Script src = "http://www.cztv.com/uibase/jquery.js"> </script>
<Script>
$ (Document). ready (function (){
Var hrefs = '';
$ ('. F_folder> A'). each (function (I ){
Var href = $ ('. f_folder: eq (' + I + ')> a: eq (0)'). attr ('href ');
If (href! = 'Undefined '){
Hrefs + = href + ',';
}
})
$. GetJSON ("http: // www. *****. com/365/getimg. php? Hrefs = "+ hrefs +" & callback =? ", Function (data ){
// Alert (data.info );
});
});
</Script>


Here, We splice the url into a ',' separated string to pass the url. getjson is used for cross-origin purposes. For some common getjson problems, see <$. Several Problems Encountered by getjson>

PHP collection program
Copy codeThe Code is as follows:
<? Php
// Capture the 365 Image
Error_reporting (E_ALL ^ E_NOTICE );
Set_time_limit (0); // set the PHP timeout value.
/**
* Get the current time
*/
Function getMicrotime (){

List ($ usec, $ sec) = explode ("", microtime ());
Return (float) $ usec + (float) $ sec );
}
$ Stime = getMicrotime ();

$ Callback = $ _ GET ['callback'];
$ Hrefs = $ _ GET ['hrefs'];
$ Urlarray = explode (',', $ hrefs );

// Obtain all images of a specified url
Function getimgs ($ url ){
$ Dirname = basename ($ url, ". php ");
If (! File_exists ($ dirname )){
Mkdir ('2017/'. $ dirname .'');
}
Clearstatcache ();
$ Data = file_get_contents ($ url );
Preg_match_all ("/(href | src) = ([" | ']?) ([^ "'>] +. (Jpg | png | PNG | JPG | gif) \ 2/I", $ data, $ matches );
// $ Matches [3] = array_unique ($ matches [3]);
Unset ($ data );
$ I = 0;

If (count ($ matches [3])> 0 ){
Foreach ($ matches [3] as $ k => $ v ){
// Determine whether the url is a standard url rather than a relative path.
If (substr ($ v, 0, 4) = 'HTTP '){

$ Ext = pathinfo ($ v, PATHINFO_EXTENSION); // image Extension

If (! File_exists ('2017/'. $ dirname.'/'. $ k.'. '. $ ext )){
File_put_contents ('2017/'. $ dirname.'/'. $ k.'. '. $ ext, file_get_contents ($ v ));
$ I ++;
} Else {
Unset ($ v );
}
Clearstatcache ();
} Else {
Unset ($ v );
}
}
Unset ($ matches );
Return $ I;
}
}

Foreach ($ urlarray as $ k => $ v ){
If ($ v! = ''){
$ J + = getimgs ($ v );
}
}
$ Etime = getMicrotime ();
Echo "total collected". $ j. "Images ";
Echo "time used". ($ etime-$ stime). "seconds ";


Considering performance issues: all the variables used in the getimgs method are unset after they are used to release the memory.

Several designed knowledge points

Determine whether the url is a standard valid image
If (substr ($ v,) = 'HTTP '), you can simply determine whether the url of the matched image is a standard url, because the collected images may be relative paths, I will directly discard the collection of such images. Of course, you can also restore these images to the standard image path, another problem is that even in the standard url format, such images may not be collected, because you do not know whether the image still exists. Maybe the image url is invalid, if you want to strictly determine whether the image url is true or not, we recommend that you check my previous PHP methods to determine whether the remote url is valid. There are three methods to verify whether the image url is a valid url.

Get image format

$ Ext = pathinfo ($ v, PATHINFO_EXTENSION); // image Extension

The pathinfo method is used here. Seven methods can be used to obtain the file format. recommended Article: seven methods for determining the image format in PHP

Download and save to local

File_put_contents ('2017/'. $ dirname.'/'. $ k.'. '. $ ext, file_get_contents ($ v ));
The file_put_contents () function writes a string to a file.
It is the same as calling the fopen (), fwrite (), and fclose () functions in turn.
The file_get_contents () function reads the entire file into a string.

Because the server supports file_get_contents, if the server disables this function, you can use curl. This tool is more powerful than file_get_contents. We recommend that you learn CURL learning and application (with multiple threads). you can use curl to download and store multiple threads.

Clear file operation Cache

The clearstatcache () function clears the File status cache. The clearstatcache () function caches the returned information of some functions to provide higher performance. But sometimes, for example, if you check the same file multiple times in a script and the file is in danger of being deleted or modified during script execution, you need to clear the File status cache, to get the correct results. To do this, you need to use the clearstatcache () function. Official manual:

Program execution time calculation

Copy codeThe Code is as follows:
/**

* Get the current time

*/

Function getMicrotime (){
List ($ usec, $ sec) = explode ("", microtime ());
Return (float) $ usec + (float) $ sec );
}


Refer to this blog article, "getting php page execution time, database read/write count, function call count, and so on" THINKPHP ".

Finally, let's take a look at the effect;



409 images were collected in 214 seconds, and an image was downloaded and saved in about 2 seconds. The total size of the image is about 62 MB:

You can download about 1800 beautiful pictures at 60*60 in an hour.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.