PHP Fetch Collection Class Snoopy introduction _php Tutorial

Source: Internet
Author: User
Tags response code
Snoopy is a PHP class that mimics the functionality of a Web browser, which accomplishes the task of getting web content and sending forms. Official website http://snoopy.sourceforge.net/

Some features of Snoopy:

    • Fetching the content of a Web page fetch ()
    • Crawl the text content of a Web page (remove HTML tags) fetchtext ()
    • Crawl Web links, form fetchlinks () Fetchform ()
    • Support for proxy hosts
    • Support Basic username/password Verification
    • Support Settings user_agent, Referer (routing), cookies and header content (header file)
    • Supports browser redirection, and can control redirection depth
    • Ability to extend links in Web pages to high-quality URLs (default)
    • Submit data and get return value
    • Support for tracking HTML frames
    • Pass cookies when redirecting is supported

Request PHP4 above is possible. Since it is a PHP class and does not need to be supported, the server does not support the best choice for curl.

Class method

1. Fetch ($uri)

This is the method used to crawl the contents of a Web page. The $URI parameter is the URL address of the crawled Web page. The results of the fetch are stored in the $this->results.

If you are crawling a frame, Snoopy will track each frame back into the array and deposit it into the $this->results.

 
  Fetch ($URL); Get all content echo $snoopy->results; Show Results?>

2. Fetchtext ($URI)

This method is similar to fetch (), except that this method removes HTML tags and other unrelated data, returning only the text content in the page.

 
  Fetchtext ($url); Get text content echo $snoopy->results; Show Results?>

3. Fetchform ($URI)

This method is similar to fetch (), except that this method removes HTML tags and other unrelated data and returns only the form content (form) in the Web page.

4. Fetchlinks ($URI)

This method is similar to fetch (), except that this method removes HTML tags and other unrelated data and only returns links to the Web page. By default, relative links are automatically completed and converted to full URLs.

5. Submit ($URI, $formvars)

This method sends a confirmation form to the link address specified by the. $formvars is an array of stored form parameters.

6. Submittext ($URI, $formvars)

This method is similar to submit (), the only difference is that this method will remove the HTML tags and other unrelated data, only return to the page after landing text content.

7. Submitlinks ($URI)

This method is similar to submit (), the only difference is that this method will remove the HTML tags and other unrelated data, only return the link in the Web page. By default, relative links are automatically completed and converted to full URLs.

Class attribute (default value in parentheses)

    • $host connected hosts
    • $port connected ports
    • $proxy The proxy host used by _host
    • $proxy The proxy host port used by _port, if available
    • $agent User Agent spoofing (Snoopy v0.1)
    • $referer The route information, if any
    • $cookies Cookies, if any
    • $rawheaders Other header information, if any
    • $maxredirs Maximum number of redirects, 0 = not allowed (5)
    • $offsiteok Whether or not to allow redirects off-site. (true)
    • $expandlinks whether to complete the link with full address (true)
    • $user Authenticated user name, if any
    • $pass Authenticated user name, if any
    • $accept http Accept type (image/gif, Image/x-xbitmap, Image/jpeg, Image/pjpeg, */*)
    • $error where to error, if any
    • $response _code Response code returned from the server
    • $headers header information returned from the server
    • $maxlength longest return data length
    • $read _timeout Read operation timeout (requires PHP 4 Beta 4+), set to 0 for no timeout
    • $timed _out If a read operation times out, this property returns True (Requires PHP 4 Beta 4+)
    • $maxframes Maximum number of frames allowed to track
    • the state of the HTTP crawled $status
    • $temp Temporary file directory (/tmp)
    • that the _dir Web server can write to
    • $curl A directory of _path Curl binary, set to False if no curl binary

Demo

Include "Snoopy.class.php"; $snoopy = new Snoopy;  $snoopy->proxy_host = "http://www.bkjia.com/librarys/veda/"; $snoopy->proxy_port = "n";  $snoopy->agent = "(compatible; MSIE 4.01; MSN 2.5; AOL 4.0; Windows 98) "; $snoopy->referer = "http://www.4wei.cn";  $snoopy->cookies["SessionID"] = 238472834723489l; $snoopy->cookies["FavoriteColor"] = "RED";  $snoopy->rawheaders["Pragma"] = "No-cache";  $snoopy->maxredirs = 2; $snoopy->offsiteok = false; $snoopy->expandlinks = false;  $snoopy->user = "Joe"; $snoopy->pass = "Bloe";  if ($snoopy->fetchtext ("http://www.4wei.cn")) {echo]
". Htmlspecialchars ($snoopy->results)."
n "; The Else echo "error fetching document:". $snoopy->error. " n ";

Gets the content of the specified URL:

 
 
  Fetch ($URL); Get all content echo $snoopy->results; Show results//optional below//$snoopy->fetchtext//Get text content (remove HTML code)//$snoopy->fetchlinks//Get link//$snoopy->fetchform  //Get form?>

Form submission:

 
  Submit ($action, $formvars);//$formvars for the submitted array echo $snoopy->results; Gets the returned results of the form submission//optional following $snoopy->submittext; Only text that is stripped of HTML is returned after submission $snoopy->submitlinks;//only the link is returned after submission?>

Now that you have submitted the form, you can do a lot of things. Next we'll disguise the IP and disguise the browser:

 
  cookies["PHPSESSID"] = ' fc106b1918bd522cc863f36890e6fff7 '; Camouflage sessionid$snoopy->agent = "(compatible; MSIE 4.01; MSN 2.5; AOL 4.0; Windows 98) "; Camouflage browser $snoopy->referer = http://www.4wei.cn; Disguise source page address http_referer$snoopy->rawheaders["Pragma"] = "No-cache"; The HTTP header information for the cache $snoopy->rawheaders["x_forwarded_for"] = "127.0.0.101"; Camouflage Ip$snoopy->submit ($action, $formvars); Echo $snoopy->results;? >

Originally we can disguise the session disguise browser, camouflage IP, haha can do a lot of things. For example, with a verification code, verify the IP vote, you can constantly cast.

PS: Here camouflage IP, in fact, is the disguise HTTP header, so the general REMOTE_ADDR obtained through the IP is not disguised, but those through the HTTP header to obtain the IP (can prevent the kind of proxy) can make their own IP.

about how to verify the code, simply say: first with the ordinary browser, view the page, find the corresponding verification code SessionID, while writing down the SessionID and verification code values, then use Snoopy to forge.

Principle: Because it is the same SessionID, the verification code obtained is the same as the first input.

Sometimes we may need to forge more things, Snoopy completely for us to think about:

 
  Proxy_host = "http://www.bkjia.com/librarys/veda/"; $snoopy->proxy_port = "8080"; Use proxy $snoopy->maxredirs = 2; Number of redirects $snoopy->expandlinks = true; Whether the complete link is often used when collecting//For example, link to/images/taoav.gif can be changed to its full link http://www.4wei.cn/images/taoav.gif$snoopy->maxframes = 5// Maximum allowable frames//Note When grabbing the frame $snoopy->results returns an array $snoopy->error//Returns an error message?>

A more complete example:

/*** need the snoopy.class.php from * http://snoopy.sourceforge.net/*/include ("snoopy.class.php"); $snoopy = new snoopy;//need an proxy?:/ /$snoopy->proxy_host = "my.proxy.host";//$snoopy->proxy_port = "8080"; Set browser and Referer: $snoopy->agent = "mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) "; $snoopy->referer =" http://www.jonasjohn.de/"; Set some cookies: $snoopy->cookies["SessionID"] = ' 238472834723489 '; $snoopy->cookies["favoritecolor"] = "Blue "; Set an raw-header: $snoopy->rawheaders["Pragma"] = "No-cache"; Set some internal variables: $snoopy->maxredirs = 2; $snoopy->offsiteok = false; $snoopy->expandlinks = false; Set username and password (optional)//$snoopy->user = "Joe";//$snoopy->pass = "Bloe"; Fetch the text of the website www.google.com:if ($snoopy->fetchtext ("http://www.google.com")) {//Other methods: Fetch, Fetchform, Fetchlinks, Submittext and submitlinks//Response Code:print "Response code:". $snoopy->response_code. " 
n "; Print the Headers:print "Headers:
"; while (list ($key, $val) = each ($snoopy->headers)) {print $key. ":". $val. "
n "; } print "
n "; Print the texts of the Website:print Htmlspecialchars ($snoopy->results). " n ";} else {print "snoopy:error while fetching document:". $snoopy->error. " n ";}

Use the Snoopy class to complete a simple picture capture:

 
  
 
 Fetchlinks ($sourceURL);     Get a link to a webpage $ A = $snoopy->results;     Get the result of Web link $re = "/d+.html$/";         Matching regular//filter gets the specified file address request from foreach ($a as $tmp) {if (Preg_match ($re, $tmp)) {$aa = $tmp;         }} getimgurl ($AA), function Getimgurl ($siteName) {$snoopy = new Snoopy ();                 $snoopy->fetch ($siteName);    $fileContent = $snoopy->results;                Gets the content of the filtered page//matches the image of the regular expression $reTag = "//i";                     if (Preg_match ($reTag, $fileContent)) {//filter picture $ret = Preg_match_all ($reTag, $fileContent, $matchResult); for ($i = 0, $len = count ($matchResult [1]), $i < $len; + + $i) {Saveimgurl ($matchResult [1][$i], $matchResul            t[2][$i]); }}} function Saveimgurl ($name, $suffix) {$url = $name. ".".                 $suffix; echo "Requested picture address:". $url. "
"; $imgSavePath = "e:/123/images/"; Picture Save Address $imgId =mt_rand (); Generate a random file name if ($suffix = = "gif") {//depending on the picture type, put in a different folder below $imgSavePath. = "Emotion"; } else {$imgSavePath. = "topic"; } $imgSavePath. = ("/". $imgId. ".". $suffix); Assemble the file name to save if (Is_file ($imgSavePath)) {//To determine if the file name exists, delete unlink ($imgSavePath); echo "

Files ". $imgSavePath." already exists and will be deleted

"; } $imgFile = file_get_contents ($url); Read Network File $flag = file_put_contents ($imgSavePath, $imgFile); Write to local if ($flag) {echo '

Files ". $imgSavePath." Saved successfully

"; }}?>

http://www.bkjia.com/PHPjc/752536.html www.bkjia.com true http://www.bkjia.com/PHPjc/752536.html techarticle Snoopy is a PHP class that mimics the functionality of a Web browser, which accomplishes the task of getting web content and sending forms. Some features of the official website http://snoopy.sourceforge.net/Snoopy ...

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.