Design ideas
Taking into account the mere acquisition of a picture of a webpage, too troublesome, so directly collect his list page, get the URL of the list and then collect in one by one, but with PHP match the URL of the list page is too troublesome, the first list page has many invalid URLs that's a problem for me, the little rookie, and look at the structure of the list page, Decisively using jquery to get Url,jquery's Universal selector is again strong.
jquery gets URL, then Ajax delivers url-> corresponding PHP file, traversal URL parameter--> single page capture save picture
jquery Program
Copy Code code as follows:
<script src= "Http://www.cztv.com/uibase/jquery.js" ></script>
<script >
$ (document). Ready (function () {
var hrefs = ';
$ ('. F_folder>a '). Each (function (i) {
var href = $ ('. F_folder:eq (' +i+ ') >a:eq (0) '). attr (' href ');
if (href!= ' undefined ') {
HREFs +=href+ ', ';
}
})
$.getjson ("http://www.****.com/365/getimg.php?hrefs=" +hrefs+ "&callback=?", function (data) {
alert (data.info);
});
});
</script>
Here the URL is spliced into a ', ' split string to pass the URL, using Getjson is to cross domain needs, about Getjson common several problems can refer to <$.getjson encountered several problems >
PHP Acquisition Program
Copy Code code as follows:
<?php
Grab 365 Pictures
Error_reporting (e_all ^ e_notice);
Set_time_limit (0);/Set the PHP timeout time
/**
* Get the current time
*/
function Getmicrotime () {
List ($usec, $sec) = Explode ("", Microtime ());
Return ((float) $usec + (float) $sec);
}
$stime = Getmicrotime ();
$callback = $_get[' callback '];
$hrefs = $_get[' HREFs '];
$urlarray = Explode (', ', $hrefs);
Get all pictures of the specified URL
function Getimgs ($url) {
$dirname = basename ($url, ". php");
if (!file_exists ($dirname)) {
mkdir (' 365/'; $dirname. ');
}
Clearstatcache ();
$data = file_get_contents ($url);
Preg_match_all ("/(HREF|SRC) = ([" | "]?) ([^ "' >]+.] (jpg|png| Png| jpg|gif)) \2/i ", $data, $matches);
$matches [3] = Array_unique ($matches [3]);
Unset ($data);
$i = 0;
if (count ($matches [3]) >0) {
foreach ($matches [3] as $k => $v) {
Simple to determine if it is a standard URL rather than a relative path
if (substr ($v, 0,4) = = ' http ') {
$ext = PathInfo ($v, pathinfo_extension);//Picture extension
if (!file_exists (' 365/'. $dirname. ') /'. $k. '. $ext)) {
File_put_contents (' 365/'. $dirname. /'. $k. '. $ext, file_get_contents ($v));
$i + +;
}else{
Unset ($v);
}
Clearstatcache ();
}else{
Unset ($v);
}
}
Unset ($matches);
return $i;
}
}
foreach ($urlarray as $k => $v) {
if ($v!= ') {
$j +=getimgs ($v);
}
}
$etime = Getmicrotime ();
echo "Totals the collection". $j. " Picture ";
echo "Spents". ($etime-$stime). " Seconds ";
Consider performance issues: the variables used in the Getimgs method are logged off (unset) to free up memory.
Several points of knowledge in design
Determine if it is a standard valid picture URL
if (substr ($v, 0,4) = = ' http ' This is simply to determine if the matching image URL is a standard URL, because the image taken may be relative to the path, here I directly give up the collection of pictures, of course, you can also restore this picture to the standard picture path, Another problem is that even the standard URL format, such a picture may not be able to collect, because you do not know whether the picture is still, perhaps the image URL has been invalid, if you want to more strictly determine whether the image URL is true and effective can be recommended to see my previous " There are three ways in which PHP can determine whether a remote URL is valid or not, to verify that it is a valid URL.
Get Picture format
$ext = PathInfo ($v, pathinfo_extension);//Picture extension
Here used the PathInfo method, summed up there are 7 ways to get to the file format, recommended article: "PHP to determine the format of the picture Seven Ways"
Download Save to Local
File_put_contents (' 365/'. $dirname. /'. $k. '. $ext, file_get_contents ($v));
The file_put_contents () function writes a string to the file.
This is the same as calling fopen (), fwrite (), and fclose () in turn.
The file_get_contents () function reads the entire file into a string.
Because the server supports file_get_contents, if the server to disable this function, you can use curl, this tool is more powerful than file_get_contents, recommended learning "Curl learning and application (with multithreading)", can use curl to download storage, the effect is more
purge file Action cache
The Clearstatcache () function clears the file state cache. The Clearstatcache () function caches the return information of certain functions to provide higher performance. But sometimes, like checking the same file multiple times in a script, and the file is in danger of being deleted or modified during the execution of the script, you need to clear the file state cache to get the correct result. To do this, you need to use the Clearstatcache () function. Official Handbook:
Program Execution Time Calculation
Copy Code code as follows:
/**
* Get the current time
*/
function Getmicrotime () {
List ($usec, $sec) = Explode ("", Microtime ());
Return ((float) $usec + (float) $sec);
}
can refer to this blog article, "Get PHP page Execution time, database read and write times, the number of function calls," "thinkphp"
Finally look at the effect;
409 seconds to collect 214 pictures, about 2 seconds to save a picture, the total size of the picture is about 62M, so it seems:
One hours 60*60 can download about 1800 pictures of beautiful women.