Snoopy is a PHP class, used to simulate the function of the browser, you can get Web content, send forms, can be used to develop some collection programs and Thieves program, this article details the use of Snoopy tutorial.
Some features of Snoopy:
Fetching the content of a Web page fetch
Crawl the text content of a Web page (remove HTML tags) fetchtext
Crawl Web links, form fetchlinks Fetchform
Support for proxy hosts
Support Basic username/password Verification
Support Settings user_agent, Referer (routing), cookies and header content (header file)
Supports browser redirection, and can control redirection depth
Ability to extend links in Web pages to high-quality URLs (default)
Submit data and get return value
Support for tracking HTML frames
Pass cookies when redirecting is supported
Requirements PhP4 above can be because PHP is a class without the support of the server does not support curl when the best choice,
Snoopy class methods and examples:
Fetch ($URI)
This is the method used to crawl the contents of a Web page.
The $URI parameter is the URL address of the crawled Web page.
The results of the fetch are stored in the $this->results.
If you are crawling a frame, Snoopy will track each frame back into the array and deposit it into the $this->results.
Fetchtext ($URI)
This method is similar to fetch (), except that this method removes HTML tags and other unrelated data, returning only the text content in the page.
Fetchform ($URI)
This method is similar to fetch (), except that this method removes HTML tags and other unrelated data and returns only the form content (form) in the Web page.
Fetchlinks ($URI)
This method is similar to fetch (), except that this method removes HTML tags and other unrelated data and only returns links to the Web page.
By default, relative links are automatically completed and converted to full URLs.
Submit ($URI, $formvars)
This method sends a confirmation form to the link address specified by the. $formvars is an array of stored form parameters.
Submittext ($URI, $formvars)
This method is similar to submit (), the only difference is that this method will remove the HTML tags and other unrelated data, only return to the page after landing text content.
Submitlinks ($URI)
This method is similar to submit (), the only difference is that this method will remove the HTML tags and other unrelated data, only return the link in the Web page.
By default, relative links are automatically completed and converted to full URLs.
Snoopy Collection Class Properties: (default value in parentheses)
$hostConnected hosts
$portPort of Connection
$proxy _hostUse the proxy host, if any
$proxy _portThe proxy host port used, if any
$agentUser Agent Spoofing (Snoopy v0.1)
$refererRoute information, if any
$cookies CookiesIf there is.
$rawheadersOther header information, if any
$maxredirsMaximum number of redirects, 0 = not allowed (5)
$offsiteokWhether or not to allow redirects off-site. (true)
$expandlinksWhether to complete the link as full address (true)
$userAuthenticated user name, if any
$passAuthenticated user name, if any
$acceptHTTP Accept type (image/gif, Image/x-xbitmap, Image/jpeg, Image/pjpeg, */*)
$errorWhere is the error, if any
$response _codeResponse code returned from the server
$headersHeader information returned from the server
$maxlengthLongest return data length
$read _timeoutRead operation timeout (requires PHP 4 Beta 4+) set to 0 for no timeout
$timed _outThis property returns True if a read operation timed out (requires PHP 4 Beta 4+)
$maxframesMaximum number of frames allowed to track
$statusThe state of the crawled HTTP
$temp _dirTemporary file directory (/tmp) that the Web server can write to
$curl _pathCurl Binary directory, if no curl binary is set to False
Here is an example:
Copy CodeThe code is as follows:
Include "Snoopy.class.php";
$snoopy = new Snoopy;
$snoopy->proxy_host = "Http://www.jb51.net";
$snoopy->proxy_port = "80";
$snoopy->agent = "(compatible; MSIE 4.01; MSN 2.5; AOL 4.0; Windows 98) ";
$snoopy->referer = "Http://www.jb51.net";
$snoopy->cookies["SessionID"] = 238472834723489l;
$snoopy->cookies["FavoriteColor"] = "RED";
$snoopy->rawheaders["Pragma"] = "No-cache";
$snoopy->maxredirs = 2;
$snoopy->offsiteok = false;
$snoopy->expandlinks = false;
$snoopy->user = "Joe";
$snoopy->pass = "Bloe";
if ($snoopy->fetchtext ("Http://www.jb51.net"))
{
echo "<PRE>". Htmlspecialchars ($snoopy->results). " </pre>\n ";
}
Else
echo "Error fetching document:". $snoopy->error. " \ n ";
Gets the specified URL content
Copy CodeThe code is as follows: <?php
$url = "Http://www.jb51.net";
Include ("snoopy.php");
$snoopy = new Snoopy;
$snoopy->fetch ($url); Get all content
Echo $snoopy->results; Show results
The following options are available
$snoopy->fetchtext//Get text content (remove HTML code)
$snoopy->fetchlinks//Get Links
$snoopy->fetchform//Get form
?>
Form submission
Copy CodeThe code is as follows: <?php
$formvars ["username"] = "admin";
$formvars ["pwd"] = "admin";
$action = "Http://www.jb51.net";//</a> form submission Address
$snoopy->submit ($action, $formvars);//$formvars for the submitted array
Echo $snoopy->results; Gets the returned result after the form is submitted
The following options are available
$snoopy->submittext; Only text that is stripped of HTML is returned after submission
$snoopy->submitlinks;//only return links after submission
?>
Now that you've submitted the form, you can do a lot of things. Next we'll disguise the IP, disguise the browser
Disguise browser
Copy CodeThe code is as follows: <?php
$formvars ["username"] = "Lanfengye";
$formvars ["pwd"] = "Lanfengye";
$action = "Http://www.jb51.net";
Include "snoopy.php";
$snoopy = new Snoopy;
$snoopy->cookies["PHPSESSID"] = ' fc106b1918bd522cc863f36890e6fff7 '; Camouflage SessionID
$snoopy->agent = "(compatible; MSIE 4.01; MSN 2.5; AOL 4.0; Windows 98) "; Disguise browser
$snoopy->referer = "Http://www.jb51.net"; Disguise source page address Http_referer
$snoopy->rawheaders["Pragma"] = "No-cache"; The HTTP header information for the cache
$snoopy->rawheaders["x_forwarded_for"] = "127.0.0.101"; Camouflage IP
$snoopy->submit ($action, $formvars);
Echo $snoopy->results;
?>
Originally we can disguise the session disguise browser, camouflage IP, haha can do a lot of things.
For example, with a verification code, verify the IP vote, you can constantly cast.
PS: Here camouflage IP, in fact, is the disguise HTTP header, so the general REMOTE_ADDR obtained through the IP is not camouflage,
Instead, those that get IP via HTTP headers (which can prevent proxies) can make their own IP.
about how to verify the code, simply say:
First use the normal browser, view the page, find the SessionID corresponding to the verification code,
Also note the SessionID and verification code values,
Next, use Snoopy to forge.
Principle: Because it is the same SessionID, the verification code obtained is the same as the first input.
Sometimes we may need to forge more things, Snoopy completely for us to think of
<?php
$snoopy->proxy_host = "Http://www.jb51.net";
$snoopy->proxy_port = "8080"; Using proxies
$snoopy->maxredirs = 2; REDIRECT Times
$snoopy->expandlinks = true; Whether the complete link is often used when collecting
For example the link for/images/taoav.gif can be changed to its full link <a href= "http://www.jb51.net/images/taoav.gif" >http://www.jb51.net/images /taoav.gif</a>
$snoopy->maxframes = 5//maximum number of frames allowed
Note that when the frame is crawled $snoopy->results returns an array
$snoopy->error//Return error message
?>
Source: http://www.jb51.net/article/51250.htm
PHP Collection Class Snoopy detailed introduction (Snoopy use tutorial)