Recently studied some of the crawler-related code, you can simply understand that the crawler is actually access to the page, and its information to a certain extent or processing, to achieve their intended goal of the program, after studying the reference to some of the online code, It encapsulates a PHP script that accesses the URL and saves the image in the IMG tag above, saying no more, on the code
<?php class Getimg {function __construct ($url = ' http://blog.csdn.net/wang_jingxiang/article/details/48647
"{$ret = $this->setrequest ($url);
$total = $this->image ($ret);
foreach ($total as $pic) {$this->savepics ($pic); The Public Function image ($url) {Preg_match_all ("/]*) \s*src= (' |\ ') ([^ ' \]+) (' |\
")/", $url, $matches);//Quoted//preg_match_all ("/]*) \ssrc= ([^\s>]+)/", $string, $matches);/No Quotes
$matches =array_unique ($matches [0]);//Remove duplicate values in array foreach ($matches as $key => $val) {
$matches [$key] = $this->stringsolve ($val);
return $matches;
The Public Function Stringsolve ($str) {$pos 1 = stripos ($str, ' "');
$pos 2 = Stripos ($str, ' "', $pos 1+2);
$str = substr ($str, $pos 1+1, $pos 2-10);
return $str;
} Public Function Savepics ($pic) {$rt = $this->setrequest ($pic); $fp = fopen (' pics '). /'. MD5 ($PIC).
JPG ', ' a ');
Fwrite ($fp, $RT);
Fclose ($FP);
The Public Function setrequest ($url) {$ch = Curl_init ();
curl_setopt ($ch, Curlopt_url, $url);
curl_setopt ($ch, Curlopt_returntransfer, 1);
curl_setopt ($ch, Curlopt_header, 0);
$ret = curl_exec ($ch);
Curl_close ($ch);
return $ret; }} $test = new getimg;?>
The main network access, using the Php curl method, encapsulated in the Setrequest function, and another more important is the image of the analytic function, that is, Imgae used to parse the IMG tag, get the image of the loading path, And finally call Savepics save in the local
The whole script is fairly simple to write, just realize the basic function, further expand to make more convenient image crawler can be considered to add HTML front-end code, visually select the URL of the page to be grilled, while you can visually pull all the pictures after the page to filter the pictures and save to the local function