Snoopy powerful PHP capture class using instance code _php tips

Source: Internet
Author: User
Tags curl get ip php class response code
Download Address: http://www.jb51.net/codes/33397.html

Some features of Snoopy:

1 fetching content from Web pages
2 Crawl page text content (remove HTML tags) fetchtext
3 Crawl page links, form fetchlinks Fetchform
4 Support Agent Host
5 support for basic username/password Authentication
6 support Set User_agent, Referer (routing), cookies and header content (header file)
7 support for browser redirection and the ability to control redirect depth
8 can extend the link in the webpage to the High quality URL (default)
9 submitting the data and getting the return value
10 support for tracking HTML frames
11 Pass Cookies when supporting redirection
Require PHP4 above it's OK. Because it is a PHP class without expanding the support server does not support the best choice of curl time,

Class method:

Fetch ($URI)
———–

This is the method used to crawl the content of a Web page.
The $URI parameter is the URL address of the crawled Web page.
The results of the crawl are stored in the $this->results.
If you're grabbing a frame, Snoopy will track each frame into an array and deposit it into the $this->results.

Fetchtext ($URI)
—————

This method is similar to fetch (), except that this method removes the HTML tag and other extraneous data and returns only the text content in the page.

Fetchform ($URI)
—————

This method is similar to fetch (), except that this method removes the HTML tags and other extraneous data and returns only the form content (form) in the Web page.

Fetchlinks ($URI)
—————-

This method is similar to fetch (), except that this method removes the HTML tags and other extraneous data and returns only the links in the Web page.
By default, relative links are automatically completed and converted to full URLs.

Submit ($URI, $formvars)
———————-

This method sends a confirmation form to the link address specified by the $url. $formvars is an array of stored form parameters.

Submittext ($URI, $formvars)
————————–

This method is similar to submit (), the only difference is that this method will remove HTML tags and other unrelated data, only return to the page after landing text content.

Submitlinks ($URI)
—————-

This method is similar to submit (), except that this method removes the HTML tags and other extraneous data and returns only the links in the Web page.
By default, relative links are automatically completed and converted to full URLs.

Class Properties: (The default value is in parentheses)

$host Connected hosts
$port Connected Ports
$proxy _host used by the proxy host, if any
$proxy the proxy host port used by _port, if any
$agent User Agent Camouflage (Snoopy v0.1)
$referer routing information, if any.
$cookies cookies, if any.
$rawheaders Other header information, if any.
$maxredirs Maximum number of redirects, 0 = not allowed (5)
$offsiteok whether or not to allow redirects off-site. (true)
$expandlinks whether the link is fully filled with the full address (true)
$user authenticated user name, if any
$pass authenticated user name, if any
$accept http Accept type (image/gif, Image/x-xbitmap, Image/jpeg, Image/pjpeg, */*)
$error where the error is, if any.
$response _code Response code returned from the server
$headers header information returned from the server
$maxlength Longest return data length
$read _timeout Read operation timeout (requires PHP 4 Beta 4+)
Set to 0 for no timeout
$timed _out If a read operation times out, this property returns True (Requires PHP 4 Beta 4+)
Maximum number of frames $maxframes allowed to track
$status the state of the HTTP being crawled
$temp The Temporary Files directory (/tmp) that the _dir Web server can write to
$curl _path Curl Binary directory, set to False if no curl binary

Here's the demo.
Copy Code code as follows:

Include "Snoopy.class.php";
$snoopy = new Snoopy;
$snoopy->proxy_host = "Http://www.jb51.net";
$snoopy->proxy_port = "80";
$snoopy->agent = "(compatible; MSIE 4.01; MSN 2.5; AOL 4.0; Windows 98) ";
$snoopy->referer = "Http://www.jb51.net";
$snoopy->cookies["SessionID"] = 238472834723489l;
$snoopy->cookies["FavoriteColor"] = "RED";
$snoopy->rawheaders["Pragma"] = "No-cache";
$snoopy->maxredirs = 2;
$snoopy->offsiteok = false;
$snoopy->expandlinks = false;
$snoopy->user = "Joe";
$snoopy->pass = "Bloe";
if ($snoopy->fetchtext ("Http://www.jb51.net"))
{
echo "<PRE>". Htmlspecialchars ($snoopy->results). " </pre>\n ";
}
Else
echo "Error fetching document:". $snoopy->error. " \ n ";

Here are some code snippets:
1. Get the specified URL content
Copy Code code as follows:

?
$url = "Http://www.jb51.net";
Include ("snoopy.php");
$snoopy = new Snoopy;
$snoopy->fetch ($url); Get all content
Echo $snoopy->results; Show results
You can choose the following
$snoopy->fetchtext//Get text content (remove HTML code)
$snoopy->fetchlinks//Get Links
$snoopy->fetchform//Get the form
?>

2 form Submission
Copy Code code as follows:

<?php
$formvars ["username"] = "admin";
$formvars ["pwd"] = "admin";
$action = "Http://www.jb51.net";//</a> form submission Address
$snoopy->submit ($action, $formvars);//$formvars for the submitted array
Echo $snoopy->results; Get the results of a return after a form is submitted
You can choose the following
$snoopy->submittext; Only text that is stripped of HTML is returned after submission
Only return link after $snoopy->submitlinks;//commit
?>

Now that you've submitted a form, you can do a lot of things. Next we're going to disguise the IP, camouflage browser
3 Camouflage
Copy Code code as follows:

<?php
$formvars ["username"] = "admin";
$formvars ["pwd"] = "admin";
$action = "Http://www.jb51.net";
Include "snoopy.php";
$snoopy = new Snoopy;
$snoopy->cookies["PHPSESSID"] = ' fc106b1918bd522cc863f36890e6fff7 '; Camouflage SessionID
$snoopy->agent = "(compatible; MSIE 4.01; MSN 2.5; AOL 4.0; Windows 98) "; Camouflage browser
$snoopy->referer = http://www.jb51.net; Camouflage Source page Address Http_referer
$snoopy->rawheaders["Pragma"] = "No-cache"; Cache HTTP Header Information
$snoopy->rawheaders["x_forwarded_for"] = "127.0.0.101"; Camouflage IP
$snoopy->submit ($action, $formvars);
Echo $snoopy->results;
?>

Originally we can disguise the session camouflage browser, camouflage IP, haha can do a lot of things.
For example, with verification code, verify IP voting, you can keep casting.
PS: Here camouflage IP, in fact, is the camouflage HTTP head, so the general through the REMOTE_ADDR to obtain IP is not disguised,
Instead, those who get IP through HTTP headers (which can prevent proxies) can make their own IP.
about how to verify the code, simply:
First use the normal browser, view the page, find the corresponding SessionID code,
Also note the SessionID and the Verification code values,
Next, use Snoopy to forge.
Principle: Because it is the same SessionID, the verification code obtained is the same as the first time input.
4 Sometimes we may need to forge more stuff, Snoopy completely for us to think of
Copy Code code as follows:

<?php
$snoopy->proxy_host = "Http://www.jb51.net";
$snoopy->proxy_port = "8080";//Use proxy
$snoopy->maxredirs = 2;//redirect times
$snoopy->expandlinks = true; Whether the complement link is often used in the acquisition of the
///For example, link to/images/taoav.gif can be changed to its full link <a href= "http://www.jb51.net/images/taoav.gif" >http ://www.jb51.net/images/taoav.gif</a>
$snoopy->maxframes = 5//maximum number of frames allowed
//Note the crawl frame $snoopy-> Results returns an array
$snoopy->error//Return error message
?>

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.