Snoopy is a php class used to simulate browser functions. It can obtain webpage content and send forms. It can be used to develop some collection programs and thief programs. This article introduces snoopy usage tutorials in detail.
Some features of Snoopy:
Fetch the webpage content
Fetchtext
Capture the link of the web page, form fetchlinks fetchform
Support proxy host
Supports basic user name/password verification
Supports setting user_agent, referer, cookies, and header content)
Supports browser redirection and can control the depth of redirection.
Extends links on a webpage to high-quality URLs (default)
Submit data and obtain the returned value
Supports tracking HTML frameworks
Supports sending cookies during redirection
The php4 and above are required. Because it is a php class, it is the best choice when the server does not need to be expanded and curl is not supported,
Snoopy class method and example:
Fetch ($ URI)
This method is used to capture the content of a webpage.
$ URI is the URL of the webpage to be crawled.
The captured results are stored in $ this-> results.
If you are capturing a framework, Snoopy will track each frame and store it in an array, and then save it to $ this-> results.
Fetchtext ($ URI)
This method is similar to fetch (). The only difference is that this method will remove HTML tags and other irrelevant data and only return the text content in the webpage.
Fetchform ($ URI)
This method is similar to fetch (). The only difference is that this method will remove the HTML Tag and other irrelevant data and only return the form Content (form) in the webpage ).
Fetchlinks ($ URI)
This method is similar to fetch (). The only difference is that this method will remove HTML tags and other irrelevant data and only return links in the webpage ).
By default, the relative link is automatically completed and converted to a complete URL.
Submit ($ URI, $ formvars)
This method sends a confirmation form to the URL specified by $ URL. $ Formvars is an array that stores form parameters.
Submittext ($ URI, $ formvars)
This method is similar to submit (). The only difference is that this method will remove HTML tags and other irrelevant data and only return the text content on the webpage after login.
Submitlinks ($ URI)
This method is similar to submit (). The only difference is that this method will remove HTML tags and other irrelevant data and only return links in the webpage ).
By default, the relative link is automatically completed and converted to a complete URL.
Snoopy collection attributes: (the default value is in brackets)
$ HostConnected host
$ PortConnected Port
$ Proxy_hostProxy host, if any
$ Proxy_portProxy host port used, if any
$ AgentUser proxy disguise (Snoopy v0.1)
$ RefererInformation of the road, if any
$ CookiesIf yes
$ RawheadersOther header information, if any
$ MaxredirsMaximum number of redirects, 0 = not allowed (5)
$ OffsiteokWhether or not to allow redirects off-site. (true)
$ ExpandlinksWhether to add all links to the full address (true)
$ UserAuthentication username, if any
$ PassAuthentication username, if any
$ AcceptHttp accept type (image/gif, image/x-xbitmap, image/jpeg, image/pjpeg ,*/*)
$ ErrorWhere is the error reported? If yes
$ Response_codeResponse Code returned from the server
$ HeadersHeader information returned from the server
$ MaxlengthMaximum length of returned data
$ Read_timeoutRead operation timeout (requires PHP 4 Beta 4 +) is set to 0 to no timeout
$ Timed_outIf a read operation times out, this attribute returns true (requires PHP 4 Beta 4 +)
$ MaxframesMaximum number of frames that can be tracked
$ StatusHttp status captured
$ Temp_dirTemporary File directory (/tmp) that can be written by the Web Server)
$ Curl_pathCURL binary directory. If there is no cURL binary, set it to false.
The following is an example:
Copy codeThe Code is as follows:
Include "Snoopy. class. php ";
$ Snoopy = new Snoopy;
$ Snoopy-> proxy_host = "http://www.jb51.net ";
$ Snoopy-> proxy_port = "80 ";
$ Snoopy-> agent = "(compatible; MSIE 4.01; MSN 2.5; AOL 4.0; Windows 98 )";
$ Snoopy-> referer = "http://www.jb51.net ";
$ Snoopy-> cookies ["SessionID"] = 238472821323489l;
$ Snoopy-> cookies ["favoriteColor"] = "RED ";
$ Snoopy-> rawheaders ["Pragma"] = "no-cache ";
$ Snoopy-> maxredirs = 2;
$ Snoopy-> offsiteok = false;
$ Snoopy-> expandlinks = false;
$ Snoopy-> user = "joe ";
$ Snoopy-> pass = "bloe ";
If ($ snoopy-> fetchtext ("http://www.jb51.net "))
{
Echo "<PRE>". htmlspecialchars ($ snoopy-> results). "</PRE> \ n ";
}
Else
Echo "error fetching document:". $ snoopy-> error. "\ n ";
Obtains the content of a specified url.
Copy codeThe Code is as follows: <? Php
$ Url = "http://www.jb51.net ";
Include ("snoopy. php ");
$ Snoopy = new Snoopy;
$ Snoopy-> fetch ($ url); // get all content
Echo $ snoopy-> results; // display the result
// Optional
$ Snoopy-> fetchtext // get text content (remove html code)
$ Snoopy-> fetchlinks // obtain the link
$ Snoopy-> fetchform // obtain the form
?>
Form submission
Copy codeThe Code is as follows: <? Php
$ Formvars ["username"] = "admin ";
$ Formvars ["pwd"] = "admin ";
$ Action = "http://www.jb51.net"; // form submission address
$ Snoopy-> submit ($ action, $ formvars); // $ formvars is the submitted Array
Echo $ snoopy-> results; // obtain the result returned after the form is submitted.
// Optional
$ Snoopy-> submittext; // after submission, only the html-removed text is returned.
$ Snoopy-> submitlinks; // after submission, only the link is returned.
?>
Since the form has been submitted, we can do a lot of things. Next we will disguise the ip address and the browser.
Camouflage Browser
Copy codeThe Code is as follows: <? Php
$ Formvars ["username"] = "lanfengye ";
$ Formvars ["pwd"] = "lanfengye ";
$ Action = "http://www.jb51.net ";
Include "snoopy. php ";
$ Snoopy = new Snoopy;
$ Snoopy-> cookies ["PHPSESSID"] = 'fc0000b1918bd522cc863f000090e6fff7 '; // disguise sessionid
$ Snoopy-> agent = "(compatible; MSIE 4.01; MSN 2.5; AOL 4.0; Windows 98)"; // camouflage Browser
$ Snoopy-> referer = "http://www.jb51.net"; // camouflage Source Page address http_referer
$ Snoopy-> rawheaders ["Pragma"] = "no-cache"; // The http header information of the cache
$ Snoopy-> rawheaders ["X_FORWARDED_FOR"] = "127.0.0.101"; // disguise ip Address
$ Snoopy-> submit ($ action, $ formvars );
Echo $ snoopy-> results;
?>
In the past, we could disguise session as a Web browser and ip address, and haha could do a lot of things.
For example, you can vote for an ip address with a verification code.
Ps: here, the disguised ip address is actually an http header, so the ip address obtained through REMOTE_ADDR cannot be disguised,
Instead, ip addresses obtained through http headers (which can prevent proxies) can be created by themselves.
Let's briefly describe how to use the Verification Code:
First, use a normal browser to view the page and find the sessionid corresponding to the Verification code,
Write down sessionid and verification code value at the same time,
Next, we will use snoopy to forge.
Principle: because it is the same sessionid, the verification code obtained is the same as the one entered for the first time.
Sometimes we may need to forge more things, and snoopy comes to mind completely for us.
<? Php
$ Snoopy-> proxy_host = "http://www.jb51.net ";
$ Snoopy-> proxy_port = "8080"; // use a proxy
$ Snoopy-> maxredirs = 2; // redirect times
$ Snoopy-> expandlinks = true; // whether to complete the link is often used during collection
// For example, the link for/images/taoav.gif can be changed to its full link <a href = "http://www.jb51.net/images/taoav.gif"> http://www.jb51.net/images/taoav.gif </a>
$ Snoopy-> maxframes = 5 // maximum number of frames allowed
// When capturing the frame, $ snoopy-> results returns an array.
$ Snoopy-> error // error message returned
?>