PHP Collection class Snoopy grab picture Instance _php Tutorial

Source: Internet
Author: User
Using PHP for two days Snoopy this class, found very useful. Get all the links in the request page, use Fetchlinks directly, get all the text information using Fetchtext (which is handled internally or using regular expressions), and more features such as mock-up forms.


How to use:

Download Snoopy class First, download address: http://sourceforge.net/projects/snoopy/
Instantiate an object and then call the appropriate method to get the crawled page information

Copy the Code code as follows:
Include ' snoopy/snoopy.class.php ';

$snoopy = new Snoopy ();

$sourceURL = "Http://www.jb51.net";
$snoopy->fetchlinks ($sourceURL);

$a = $snoopy->results;

It does not provide a way to get all the image addresses in a webpage, and there is a need to get a picture address in the list of all the articles in a page. Then he wrote one, mostly or just where the match was important.
Copy the Code code as follows:
Regular expressions to match pictures
$reTag = "//i";


Because the demand is more special, only need to grab the picture of the beginning of the dead htp://(the picture of the station may make the anti-theft chain, want to crawl to the local)

1. Crawl the specified page and filter out all the expected article addresses;

2. Loop through the address of the article in the first step and match it with a regular expression that matches the image to get all the image addresses in the page that match the rules;

3. Save the picture according to the image suffix and ID (only GIF, JPG here)---if the picture file exists, delete it and save it first.

Copy CodeThe code is as follows:

<?php
Include ' snoopy/snoopy.class.php ';

$snoopy = new Snoopy ();

$sourceURL = "Http://xxxxx";
$snoopy->fetchlinks ($sourceURL);

$a = $snoopy->results;
$re = "/d+.html$/";

Filter gets the specified file address request
foreach ($a as $tmp) {
if (Preg_match ($re, $tmp)) {
Getimgurl ($TMP);
}
}

function Getimgurl ($siteName) {
$snoopy = new Snoopy ();
$snoopy->fetch ($siteName);

$fileContent = $snoopy->results;

Regular expressions to match pictures
$reTag = "//i";

if (Preg_match ($reTag, $fileContent)) {
$ret = Preg_match_all ($reTag, $fileContent, $matchResult);

for ($i = 0, $len = count ($matchResult [1]); $i < $len; + + $i) {
Saveimgurl ($matchResult [1][$i], $matchResult [2][$i]);
}
}
}

function Saveimgurl ($name, $suffix) {
$url = $name. ".". $suffix;

echo "Requested picture address:". $url. "
";

$imgSavePath = "e:/xxx/style/images/";
$imgId = Preg_replace ("/^.+/(d+) $/", "\1", $name);
if ($suffix = = "gif") {
$imgSavePath. = "Emotion";
} else {
$imgSavePath. = "topic";
}
$imgSavePath. = ("/". $imgId. ".". $suffix);

if (Is_file ($imgSavePath)) {
Unlink ($imgSavePath);
echo "

Files ". $imgSavePath." already exists and will be deleted

";
}

$imgFile = file_get_contents ($url);
$flag = File_put_contents ($imgSavePath, $imgFile);

if ($flag) {
echo "

Files ". $imgSavePath." Saved successfully

";
}
}
?>

In the use of PHP crawling Web pages: content, images, links, I think the most important thing is the regular (according to the content and the specified rules to obtain the desired data), the idea is actually relatively simple, the method used is not many, but also those several (and crawl content or directly call someone else to write the method of the class can be)

But the previous thought is that PHP does not seem to implement the following methods, such as a file has n rows (n very Large), it is necessary to match the rules of the line content to replace, such as the 3rd line is the AAA need to turn into bbbbb. Common practice when you need to modify a file:

1. Once the entire file is read (or read line by row), then use the temporary file to save the final converted results, and then replace the original file

2. Progressive read, use Fseek to control the position of the file pointer, and then fwrite write

Scenario 1 When the file is large, one read is not fetched (read by line, then write temporary files and then replace the original file efficiency feeling is not high), scenario 2 will be replaced by the length of the string is less than equal to the target value is not a problem, but more than the problem, it will "cross", The data for the next line is also disrupted (you cannot replace it with new content, as there is a "selection" concept in JavaScript).

Here is the code to experiment with scenario 2:
Copy the Code code as follows:
<?php
$mode = "r+";
$filename = "D:/file.txt";
$fp = fopen ($filename, $mode);
if ($fp) {
$i = 1;
while (!feof ($fp)) {
$str = fgets ($FP);
Echo $str;
if ($i = = 1) {
$len = strlen ($STR);
Fseek ($FP,-$len, seek_cur);//The Pointer moves forward
Fwrite ($FP, "123");
}
i++;
}
Fclose ($FP);
}
?>

Reads a row first, when the file pointer actually refers to the beginning of the next line, using Fseek to move the file pointer back to the beginning of the previous line, and then use fwrite to replace the operation, because it is a replacement operation, without specifying the length of the case, it affects the next row of data, And what I want to do is just want to work on this line, such as deleting this line or replacing the whole line with just one 1, the above example is not up to the requirement, maybe I haven't found the right method ...

http://www.bkjia.com/PHPjc/825392.html www.bkjia.com true http://www.bkjia.com/PHPjc/825392.html techarticle using PHP for two days Snoopy this class, found very useful. Get all the links inside the request page, use Fetchlinks directly, get all the text information using Fetchtext (its inside also ...

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.