PHP Snoopy Collection Class Introduction

Source: Internet
Author: User

Snoopy is a PHP class used to simulate some of the simple features of a browser, which can get web content, send forms, and so on. Snoopy run correctly requires your server to have PHP version above 4, and Support Pcre (Perl compatible Regular Expressions), and basic lamp services are supported. Because it is a PHP class, it does not need to be extended, so it is the best option when the server does not support curl.

Characteristics of Snoopy:

1, capture the content of the page fetch

2, crawl the text content of the webpage (remove HTML tag) fetchtext

3, crawl the link of the webpage, form Fetchlinks fetchform

4, support Agent Host

5, support the basic user name/password verification

6, support Settings user_agent, Referer (routing), cookies and header content (header file)

7, support browser redirection, and can control the redirection depth

8, can extend the link in the webpage to the High quality URL (the default)

9, submit the data and get the return value

10, support the tracking HTML framework

11. Transfer cookies when supporting redirection

Download address of Snoopy class: http://sourceforge.net/projects/snoopy/

Snoopy class Method:

Fetch ($URI)

This is the method used to crawl the content of a Web page. The $URI parameter is the URL address of the crawled Web page. The results of the crawl are stored in the $this->results. If you're grabbing a frame, Snoopy will track each frame into an array and deposit it into the $this->results.

Fetchtext ($URI)

This method is similar to fetch (), except that this method removes the HTML tag and other extraneous data and returns only the text content in the page.

Fetchform ($URI)

This method is similar to fetch (), except that this method removes the HTML tags and other extraneous data and returns only the form content (form) in the Web page.

Fetchlinks ($URI)

This method is similar to fetch (), except that this method removes the HTML tags and other extraneous data and returns only the links in the Web page. By default, relative links are automatically completed and converted to full URLs.

Submit ($URI, $formvars)

This method sends a confirmation form to the link address specified by the $url. $formvars is an array of stored form parameters.

Submittext ($URI, $formvars)

This method is similar to submit (), the only difference is that this method will remove HTML tags and other unrelated data, only return to the page after landing text content.

Submitlinks ($URI)

This method is similar to submit (), except that this method removes the HTML tags and other extraneous data and returns only the links in the Web page. By default, relative links are automatically completed and converted to full URLs.

Snoopy class Properties: (The default value is in parentheses)

$host Connected hosts
$port Connected Ports
$proxy _host used by the proxy host, if any
$proxy the proxy host port used by _port, if any
$agent User Agent Camouflage (Snoopy v0.1)
$referer routing information, if any.
$cookies cookies, if any.
$rawheaders Other header information, if any.
$maxredirs Maximum number of redirects, 0 = not allowed (5)
$offsiteok whether or not to allow redirects off-site. (true)
$expandlinks whether the link is fully filled with the full address (true)
$user authenticated user name, if any
$pass authenticated user name, if any
$accept http Accept type (image/gif, Image/x-xbitmap, Image/jpeg, Image/pjpeg, */*)
$error where the error is, if any.
$response _code Response code returned from the server
$headers header information returned from the server
$maxlength Longest return data length
$read _timeout Read operation timeout (requires PHP 4 Beta 4+)
Set to 0 for no timeout
$timed _out If a read operation times out, this property returns True (Requires PHP 4 Beta 4+)
Maximum number of frames $maxframes allowed to track
$status the state of the HTTP being crawled
$temp The Temporary Files directory (/tmp) that the _dir Web server can write to
$curl _path Curl Binary directory, set to False if no curl binary

Snoopy Use Example:

(1) Get the specified URL content

$url = ' http://www.Alixixi.com ';
Include (' snoopy.php ');
$snoopy =new Snoopy;
$snoopy->fetch ($url);//Get all content 
echo $snoopy->results;//display results 
$snoopy->fetchtext// Get text content (remove HTML code) 
$snoopy->fetchlinks//Get page All links 
$snoopy->fetchform//Get page form information

(2) Submitting the form

Include ' snoopy.php ';
$snoopy =new Snoopy;
$formvars [' username ']= ' admin ';
$formvars [' pwd ']= ' admin ';
$action = ' http://www. Alixixi.com '//form submission address 
$snoopy->submit ($action, $formvars);//$formvars for the submitted array 
echo $snoopy-> results;//gets the results returned after the form is submitted 
$snoopy->submittext;//only returns text that is stripped of HTML after submission 
$snoopy->submitlinks;//only returns a link after submitting

(3) using Snoopy to disguise

$formvars [' username ']= ' admin ';
$formvars [' pwd ']= ' admin ';
$action = ' http://www. Alixixi.com ';
Include ' snoopy.php ';
$snoopy =new Snoopy;
$snoopy->cookies[' phpsessid ']= ' fc206b1918bd522cc863p36890e6notef7 '/disguise SessionID 
$snoopy->agent= ' ( compatible; MSIE 4.01; MSN 2.5; AOL 4.0; Windows 98) '//Camouflage browser 
$snoopy->referer= ' http://www. Alixixi.com '//Camouflage source page address Http_referer 
$snoopy->rawheaders[' Pragma ']= ' no-cache '; HTTP header information for//cache 
$ snoopy->rawheaders[' x_forwarded_for ']= ' 127.0.0.1 '/camouflage IP 
$snoopy->submit ($action, $formvars);
Echo $snoopy->results;

Articles that you may be interested in

    • PHP prompts PHP warning:date (): It is not safe to rely on the ... The wrong way out
    • A simple method for calculating weights in PHP (suitable for lottery applications)
    • Insert and UPDATE statement construction classes for beginners in PHP
    • Powerful PHP Image processing class (watermark, transparency, zoom, sharpen, rotate, flip, cut, invert color)
    • The difference between a variable and a function in PHP after adding the static keyword
    • PHP Curl Batch multi-threaded open URL class
    • PHP window Platform simulation CHECKDNSRR function to detect whether the email is real
    • PHP implementation to capitalize the RMB amount


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.