Considering the simple collection of a webpage picture, too troublesome, so directly collect his list page, get the URL of the list and then in one by one to collect, but with PHP to match the URL of the list page is too troublesome, the first list page has a lot of invalid URLs it's a problem for me, this regular little rookie, looking at the structure of the list page, A decisive use of jquery to get Url,jquery's Universal selector is again strong again.
jquery Gets the URL, and then Ajax passes the url-> corresponding to the PHP file, traversing the URL parameters--Save the picture with a single page capture
jquery Program Copy CodeThe code is as follows:
Here, the URL is stitched together as ', ' the string passed the URL, the use of Getjson is for cross-domain needs, about getjson a few common problems can see <$.getjson encounter several problems >
PHP Collection Program Copy CodeThe code is as follows:
Grab 365 Pics
Error_reporting (e_all ^ e_notice);
Set_time_limit (0);//Set PHP time-out
/**
* Get current time
*/
function Getmicrotime () {
Get all pictures of the specified URL
function Getimgs ($url) {
$dirname = basename ($url, ". php");
if (!file_exists ($dirname)) {
mkdir (' 365/'. $dirname. ');
}
Clearstatcache ();
$data = file_get_contents ($url);
Preg_match_all ("/(HREF|SRC) = ([" | ']?) ([^ "' >]+. (jpg|png| Png| jpg|gif)) \2/i ", $data, $matches);
$matches [3] = Array_unique ($matches [3]);
Unset ($data);
$i = 0;
if (count ($matches [3]) >0) {
foreach ($matches [3] as $k = + $v) {
Simple to determine if it is a standard URL, not a relative path
if (substr ($v, 0,4) = = ' http ') {
Consider the performance issue: the variables used in the Getimgs method are unregistered (unset) after use in order to free up memory.
A few knowledge points designed to
Determine if the standard valid image URL
if (substr ($v, 0,4) = = ' http ') this is simply a match to the image URL is a standard URL, because the captured picture may be a relative path, here I directly give up the collection of this image, of course, you can also restore this image as a standard image path, Another problem is that even the standard URL format, such a picture may not be able to collect, because you do not know whether the picture is still there, perhaps this image URL is invalid, if you want to more strictly determine whether the image URL is true and effective can be recommended to see my previous " There are three ways in which PHP can determine if a remote URL is valid is a valid URL.
Here PathInfo method, summed up there are 7 ways to obtain the format of the file, recommended article: "PHP Seven ways to determine the image format"
Download Save to Local
File_put_contents (' 365/'. $dirname. ' /'. $k. '. $ext, file_get_contents ($v));
The file_put_contents () function writes a string to the file.
Same as calling fopen (), fwrite (), and fclose () in turn.
The file_get_contents () function reads the entire file into a string.
Because the server supports file_get_contents, if the server disable this function, you can use curl, this tool is more powerful than file_get_contents, recommended learning "Curl learning and application (with multithreading)", Can use Curl's multi-threaded download storage, more effective
Purge file Operations cache
The Clearstatcache () function clears the file state cache. The Clearstatcache () function caches the return information of some functions in order to provide higher performance. But sometimes, for example, if you check the same file multiple times in a script and the file is at risk of being deleted or modified during the execution of the script, you need to clear the file state cache to get the correct results. To do this, you need to use the Clearstatcache () function. Official Handbook:
Program Execution Time Calculation
Copy CodeThe code is as follows:
/**
* Get current time
*/
function Getmicrotime () {
List ($usec, $sec) = Explode ("", Microtime ());
return (float) $usec + (float) $sec);
}
can refer to this blog post; Get php page Execution time, database read and write times, function calls and so on "thinkphp"
Finally look at the effect;
409 seconds to collect 214 pictures, about 2 seconds to download a picture, the total size of the picture is about 62M, so it looks like:
One hours 60*60 can download about 1800 photos of beautiful women.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.