How to use PHP capture to crawl CSS image code detailed

Source: Internet
Author: User
Tags readfile
One. Crawl the picture in the CSS:
> 1. First, prepare for the job:
> First, the original CSS path is stored in the $url variable, and then the content of the CSS is saved in Abc.css.
> Because of the situation that often encountered multiple CSS files, so little evil does not directly fill a CSS path.
> Instead, merge the contents of several CSS files together and plug them into the Abc.css file, quack.

$data = file_get_contents (' abc.css ');

> then reads the contents of the CSS file into the $data variable, then uses the regular type to take out the domain name.
> Because there are a lot of picture files that take the relative root path, say/img/1.gif and img/1.gif.
> then the CSS original address in http://www.jb51.net/css/so the above two file locations are different.

> The first file is in/upload/201109/20110926143903807.gif because its path is used relative to the root path.
> and the second one in/upload/201109/20110926143903169.gif, its path is just the normal relative path.

2. Create a picture Storage folder:
> Small evil here Is_dir to determine if the folder exists, there is no need to build a second time.
> Oh, by the way, the Is_file function can determine whether this file is a normal file, or whether it exists.
> But file_exists () is a bit superior, because some people have been discussing it on webmasterworld.com.

if (!is_dir (' img ')) {mkdir (' img ');}

> 3. Take the image relative to the address in regular style:

$regex = '/url\ (\ ' {0,1}\ ' {0,1} (. *?) \ ' {0,1}\ ' {0,1}\)/';
//This matches the image address with a regular match, consider three cases, the URL (1.gif) url (' 1.gif ') url ("1.gif").
//These three kinds of writing are available, so let's use the above regular to take out the 1.gif inside.
//\ ' {0,1} indicates that single quotes can occur 1 or 0 times, \ "indicates that double quotes can occur 1 or 0 times. The
//middle must use lazy match, otherwise it is 1.gif "instead of 1.gif Bird, O (∩_∩) P.
Preg_match_all ($regex, $data, $result);

> 4. Work with these images:

> First use a loop to process the first branch content array that is extracted with the regular.
>, where the first branch represents the first parenthesis in the regular style, oh, and so on.

foreach ($result [1] as $val) {}

> is then judged with a regular, because this/upload/201109/20110926143903807.gif is also considered.
> This is using the full path instead of/img/1.gif or img/1.gif.
> So judge it alone, then judge the two, and see if it's/img/1.gif or img/1.gif.

<?php//url is the full picture address of the remote, can not be empty, $filename is saved as the picture name//default to put the picture in the same directory as this script function Grabimage ($url, $filename = "") {///$url NULL returns FALSE, if ($url = = "") {return false;} $ext = STRRCHR ($url, "."); /Get the image extension if ($ext! = ". gif" && $ext! = ". jpg" && $ext! = ". bmp") {echo "format not supported!) "; return false;} if ($filename = = "") {$filename = time (). " $ext ";} Take the time stamp to another name//start capturing Ob_start (); ReadFile ($url); $img = Ob_get_contents (); Ob_end_clean (); $size = strlen ($img); $FP 2 = fopen ($filename, "a"); Fwrite ($fp 2, $img); Fclose ($fp 2); return $filename; }//Test Grabimage ("Http://www.php.cn/images/logo.gif", "as.gif");; 

Ob_start: Open Output Buffer
This function would turn output buffering on. While output buffering are active no output is sent from the script (other than headers), instead the output is stored in a n Internal buffer. (Output is buffered in-house)
//
ReadFile: Reads a file and writes to the output buffer
Returns the number of bytes read from the file. If an error returns false and the error message is displayed unless it is called in the form of @readfile ().
//

Ob_get_contents:return The contents of the output buffer (returns the contents of the buffered content)
This would return the contents of The output buffer without clearing it or FALSE if the output buffering isn ' t active. (returns FALSE if the output buffer is not active (open))
//
Ob_end_clean (): Clean (erase) The output buffer and turn off output buffering (clear out buffer).

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.