PHP Batch collection Download beautiful pictures of the implementation code _php example

Source: Internet
Author: User
Tags curl current time explode
Design ideas

Taking into account the mere acquisition of a picture of a webpage, too troublesome, so directly collect his list page, get the URL of the list and then collect in one by one, but with PHP match the URL of the list page is too troublesome, the first list page has many invalid URLs that's a problem for me, the little rookie, and look at the structure of the list page, Decisively using jquery to get Url,jquery's Universal selector is again strong.

jquery gets URL, then Ajax delivers url-> corresponding PHP file, traversal URL parameter--> single page capture save picture

jquery Program
Copy Code code as follows:

<script src= "Http://www.cztv.com/uibase/jquery.js" ></script>
<script >
$ (document). Ready (function () {
var hrefs = ';
$ ('. F_folder>a '). Each (function (i) {
var href = $ ('. F_folder:eq (' +i+ ') >a:eq (0) '). attr (' href ');
if (href!= ' undefined ') {
HREFs +=href+ ', ';
}
})
$.getjson ("http://www.****.com/365/getimg.php?hrefs=" +hrefs+ "&callback=?", function (data) {
alert (data.info);
});
});
</script>


Here the URL is spliced into a ', ' split string to pass the URL, using Getjson is to cross domain needs, about Getjson common several problems can refer to <$.getjson encountered several problems >

PHP Acquisition Program
Copy Code code as follows:

<?php
Grab 365 Pictures
Error_reporting (e_all ^ e_notice);
Set_time_limit (0);/Set the PHP timeout time
/**
* Get the current time
*/
function Getmicrotime () {

List ($usec, $sec) = Explode ("", Microtime ());
Return ((float) $usec + (float) $sec);
}
$stime = Getmicrotime ();

$callback = $_get[' callback '];
$hrefs = $_get[' HREFs '];
$urlarray = Explode (', ', $hrefs);

Get all pictures of the specified URL
function Getimgs ($url) {
$dirname = basename ($url, ". php");
if (!file_exists ($dirname)) {
mkdir (' 365/'; $dirname. ');
}
Clearstatcache ();
$data = file_get_contents ($url);
Preg_match_all ("/(HREF|SRC) = ([" | "]?) ([^ "' >]+.] (jpg|png| Png| jpg|gif)) \2/i ", $data, $matches);
$matches [3] = Array_unique ($matches [3]);
Unset ($data);
$i = 0;

if (count ($matches [3]) >0) {
foreach ($matches [3] as $k => $v) {
Simple to determine if it is a standard URL rather than a relative path
if (substr ($v, 0,4) = = ' http ') {

$ext = PathInfo ($v, pathinfo_extension);//Picture extension

if (!file_exists (' 365/'. $dirname. ') /'. $k. '. $ext)) {
File_put_contents (' 365/'. $dirname. /'. $k. '. $ext, file_get_contents ($v));
$i + +;
}else{
Unset ($v);
}
Clearstatcache ();
}else{
Unset ($v);
}
}
Unset ($matches);
return $i;
}
}

foreach ($urlarray as $k => $v) {
if ($v!= ') {
$j +=getimgs ($v);
}
}
$etime = Getmicrotime ();
echo "Totals the collection". $j. " Picture ";
echo "Spents". ($etime-$stime). " Seconds ";


Consider performance issues: the variables used in the Getimgs method are logged off (unset) to free up memory.

Several points of knowledge in design

Determine if it is a standard valid picture URL
if (substr ($v, 0,4) = = ' http ' This is simply to determine if the matching image URL is a standard URL, because the image taken may be relative to the path, here I directly give up the collection of pictures, of course, you can also restore this picture to the standard picture path, Another problem is that even the standard URL format, such a picture may not be able to collect, because you do not know whether the picture is still, perhaps the image URL has been invalid, if you want to more strictly determine whether the image URL is true and effective can be recommended to see my previous " There are three ways in which PHP can determine whether a remote URL is valid or not, to verify that it is a valid URL.

Get Picture format

$ext = PathInfo ($v, pathinfo_extension);//Picture extension

Here used the PathInfo method, summed up there are 7 ways to get to the file format, recommended article: "PHP to determine the format of the picture Seven Ways"

Download Save to Local

File_put_contents (' 365/'. $dirname. /'. $k. '. $ext, file_get_contents ($v));
The file_put_contents () function writes a string to the file.
This is the same as calling fopen (), fwrite (), and fclose () in turn.
The file_get_contents () function reads the entire file into a string.

Because the server supports file_get_contents, if the server to disable this function, you can use curl, this tool is more powerful than file_get_contents, recommended learning "Curl learning and application (with multithreading)", can use curl to download storage, the effect is more

purge file Action cache

The Clearstatcache () function clears the file state cache. The Clearstatcache () function caches the return information of certain functions to provide higher performance. But sometimes, like checking the same file multiple times in a script, and the file is in danger of being deleted or modified during the execution of the script, you need to clear the file state cache to get the correct result. To do this, you need to use the Clearstatcache () function. Official Handbook:

Program Execution Time Calculation

Copy Code code as follows:

/**

* Get the current time

*/

function Getmicrotime () {
List ($usec, $sec) = Explode ("", Microtime ());
Return ((float) $usec + (float) $sec);
}


can refer to this blog article, "Get PHP page Execution time, database read and write times, the number of function calls," "thinkphp"

Finally look at the effect;



409 seconds to collect 214 pictures, about 2 seconds to save a picture, the total size of the picture is about 62M, so it seems:

One hours 60*60 can download about 1800 pictures of beautiful women.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.