Grab a picture using PHP's Snoopy class _php tutorial

Source: Internet
Author: User
Using PHP for two days Snoopy this class, found very useful. Get all the links in the request page, use Fetchlinks directly, get all the text information using Fetchtext (which is handled internally or using regular expressions), and more features such as mock-up forms.

How to use:

    1. Download Snoopy class First, download address: http://sourceforge.net/projects/snoopy/
    2. Instantiate an object and then call the appropriate method to get the crawled page information

Example:

Include ' snoopy/snoopy.class.php ';    $snoopy = new Snoopy ();    $sourceURL = "Http://xxxxxxxxx"; $snoopy->fetchlinks ($sourceURL);    $a = $snoopy->results;

It does not provide a way to get all the image addresses in a webpage, and there is a need to get a picture address in the list of all the articles in a page. Then he wrote one, mostly or just where the match was important.

A regular expression that matches a picture $reTag = "//i";

Because the demand is more special, only need to grab the picture of the beginning of the dead htp://(the picture of the station may make the anti-theft chain, want to crawl to the local)

    1. Crawl the specified page and filter out all the expected article addresses;
    2. Loop through the address of the article in the first step, and then use a regular expression matching the image to match, get all the rules of the page image address;
    3. Save the picture based on the picture suffix and ID (only GIF, JPG here)---if the picture file exists, delete it and save it.
 
  
 
 Fetchlinks ($sourceURL);    $a = $snoopy->results;        $re = "/d+.html$/";        Filter gets the specified file address request for foreach ($a as $tmp) {if (Preg_match ($re, $tmp)) {Getimgurl ($tmp);        }} function Getimgurl ($siteName) {$snoopy = new Snoopy ();                $snoopy->fetch ($siteName);                $fileContent = $snoopy->results;                A regular expression that matches a picture $reTag = "//i";                        if (Preg_match ($reTag, $fileContent)) {$ret = Preg_match_all ($reTag, $fileContent, $matchResult); for ($i = 0, $len = count ($matchResult [1]), $i < $len; + + $i) {Saveimgurl ($matchResult [1][$i            ], $matchResult [2][$i]); }}} function Saveimgurl ($name, $suffix) {$url = $name. ".".                $suffix; echo "Requested picture address:". $url. "
"; $imgSavePath = "e:/xxx/style/images/"; $imgId = Preg_replace ("/^.+/(d+) $/", "\1", $name); if ($suffix = = "gif") {$imgSavePath. = "Emotion"; } else {$imgSavePath. = "topic"; } $imgSavePath. = ("/". $imgId. ".". $suffix); if (Is_file ($imgSavePath)) {unlink ($imgSavePath); echo "

Files ". $imgSavePath." already exists and will be deleted

"; } $imgFile = file_get_contents ($url); $flag = File_put_contents ($imgSavePath, $imgFile); if ($flag) {echo '

Files ". $imgSavePath." Saved successfully

"; }}?>

In the use of PHP crawling Web pages: content, images, links, I think the most important thing is the regular (according to the content and the specified rules to obtain the desired data), the idea is actually relatively simple, the method used is not many, but also those several (and crawl content or directly call someone else to write the method of the class can be)

But the previous thought is that PHP does not seem to implement the following methods, such as a file has n rows (n very Large), it is necessary to match the rules of the line content to replace, such as the 3rd line is the AAA need to turn into bbbbb. Common practice when you need to modify a file:

    1. Read the entire file at once (or read it row by line), then use the temporary file to save the final converted results, then replace the original file
    2. Read by line, use fseek to control the position of the file pointer, and then fwrite write

Scenario 1 When the file is large, one read is not fetched (read by line, then write temporary files and then replace the original file efficiency feeling is not high), scenario 2 will be replaced by the length of the string is less than equal to the target value is not a problem, but more than the problem, it will "cross", The data for the next line is also disrupted (you cannot replace it with new content, as there is a "selection" concept in JavaScript).

Here is the code to experiment with scenario 2:

 
  

Reads a row first, when the file pointer actually refers to the beginning of the next line, using Fseek to move the file pointer back to the beginning of the previous line, and then use fwrite to replace the operation, because it is a replacement operation, without specifying the length of the case, it affects the next row of data, And what I want to do is just want to work on this line, such as deleting this line or replacing the whole line with just one 1, the above example is not up to the requirement, maybe I haven't found the right method ...

http://www.bkjia.com/PHPjc/752523.html www.bkjia.com true http://www.bkjia.com/PHPjc/752523.html techarticle using PHP for two days Snoopy this class, found very useful. Get all the links inside the request page, use Fetchlinks directly, get all the text information using Fetchtext (its inside also ...

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.