Snoopy powerful PHP collection class using instance code _php tutorial

Source: Internet
Author: User
Tags get ip response code
Download Address: http://www.jb51.net/codes/33397.html

Some features of Snoopy:

1 fetching the content of a webpage fetch
2 crawling the text content of a Web page (removing HTML tags) fetchtext
3 Crawling Web links, form fetchlinks Fetchform
4 Support Agent Host
5 support basic username/password Verification
6 Support Settings User_agent, Referer (routing), cookies and header content (header file)
7 supports browser redirection and can control redirection depth
8 can expand the link in the Web page into a high-quality URL (default)
9 submit data and get return value
10 support for tracking HTML frames
11 Pass Cookies when redirecting is supported
Requirements PhP4 above can be because PHP is a class without the support of the server does not support curl when the best choice,

Class method:

Fetch ($URI)
———–

This is the method used to crawl the contents of a Web page.
The $URI parameter is the URL address of the crawled Web page.
The results of the fetch are stored in the $this->results.
If you are crawling a frame, Snoopy will track each frame back into the array and deposit it into the $this->results.

Fetchtext ($URI)
—————

This method is similar to fetch (), except that this method removes HTML tags and other unrelated data, returning only the text content in the page.

Fetchform ($URI)
—————

This method is similar to fetch (), except that this method removes HTML tags and other unrelated data and returns only the form content (form) in the Web page.

Fetchlinks ($URI)
—————-

This method is similar to fetch (), except that this method removes HTML tags and other unrelated data and only returns links to the Web page.
By default, relative links are automatically completed and converted to full URLs.

Submit ($URI, $formvars)
———————-

This method sends a confirmation form to the link address specified by the. $formvars is an array of stored form parameters.

Submittext ($URI, $formvars)
————————–

This method is similar to submit (), the only difference is that this method will remove the HTML tags and other unrelated data, only return to the page after landing text content.

Submitlinks ($URI)
—————-

This method is similar to submit (), the only difference is that this method will remove the HTML tags and other unrelated data, only return the link in the Web page.
By default, relative links are automatically completed and converted to full URLs.

Class Properties: (default value in parentheses)

$host a connected host
$port Connected Ports
$proxy proxy host used by _host, if any
$proxy the proxy host port used by the _port, if any
$agent User Agent Spoofing (Snoopy v0.1)
$referer Route information, if any
$cookies cookies, if any
$rawheaders Other header information, if any
$maxredirs maximum redirects, 0 = not allowed (5)
$offsiteok whether or not to allow redirects off-site. (true)
$expandlinks whether to complete the link with full address (true)
$user authenticated user name, if any
$pass authenticated user name, if any
$accept http Accept type (image/gif, Image/x-xbitmap, Image/jpeg, Image/pjpeg, */*)
$error where to error, if any.
$response _code Response code returned from the server
$headers header information returned from the server
$maxlength Longest return data length
$read _timeout Read operation timeout (requires PHP 4 Beta 4+)
Set to 0 for no timeout
$timed _out If a read operation times out, this property returns True (Requires PHP 4 Beta 4+)
Maximum number of frames $maxframes allowed to track
$status the state of the crawled HTTP
$temp Temporary file directory (/tmp) that the _dir Web server can write to
$curl _path Curl Binary directory, if no curl binary is set to False

Here is the demo
Copy CodeThe code is as follows:
Include "Snoopy.class.php";
$snoopy = new Snoopy;
$snoopy->proxy_host = "Http://www.jb51.net";
$snoopy->proxy_port = "80";
$snoopy->agent = "(compatible; MSIE 4.01; MSN 2.5; AOL 4.0; Windows 98) ";
$snoopy->referer = "Http://www.jb51.net";
$snoopy->cookies["SessionID"] = 238472834723489l;
$snoopy->cookies["FavoriteColor"] = "RED";
$snoopy->rawheaders["Pragma"] = "No-cache";
$snoopy->maxredirs = 2;
$snoopy->offsiteok = false;
$snoopy->expandlinks = false;
$snoopy->user = "Joe";
$snoopy->pass = "Bloe";
if ($snoopy->fetchtext ("Http://www.jb51.net"))
{
echo "
". Htmlspecialchars ($snoopy->results)."
\ n ";
}
Else
echo "Error fetching document:". $snoopy->error. " \ n ";

Here are some code snippets:
1. Get the content of the specified URL
Copy CodeThe code is as follows:
$url = "Http://www.jb51.net";
Include ("snoopy.php");
$snoopy = new Snoopy;
$snoopy->fetch ($url); Get all content
Echo $snoopy->results; Show results
The following options are available
$snoopy->fetchtext//Get text content (remove HTML code)
$snoopy->fetchlinks//Get Links
$snoopy->fetchform//Get form
?>

2 form Submission
Copy CodeThe code is as follows:
$formvars ["username"] = "admin";
$formvars ["pwd"] = "admin";
$action = "http://www.jb51.net";//form submission Address
$snoopy->submit ($action, $formvars);//$formvars for the submitted array
Echo $snoopy->results; Gets the returned result after the form is submitted
The following options are available
$snoopy->submittext; Only text that is stripped of HTML is returned after submission
$snoopy->submitlinks;//only return links after submission
?>

Now that you've submitted the form, you can do a lot of things. Next we'll disguise the IP, disguise the browser
3 Camouflage
Copy CodeThe code is as follows:
$formvars ["username"] = "admin";
$formvars ["pwd"] = "admin";
$action = "Http://www.jb51.net";
Include "snoopy.php";
$snoopy = new Snoopy;
$snoopy->cookies["PHPSESSID"] = ' fc106b1918bd522cc863f36890e6fff7 '; Camouflage SessionID
$snoopy->agent = "(compatible; MSIE 4.01; MSN 2.5; AOL 4.0; Windows 98) "; Disguise browser
$snoopy->referer = http://www.jb51.net; Disguise source page address Http_referer
$snoopy->rawheaders["Pragma"] = "No-cache"; The HTTP header information for the cache
$snoopy->rawheaders["x_forwarded_for"] = "127.0.0.101"; Camouflage IP
$snoopy->submit ($action, $formvars);
Echo $snoopy->results;
?>

Originally we can disguise the session disguise browser, camouflage IP, haha can do a lot of things.
For example, with a verification code, verify the IP vote, you can constantly cast.
PS: Here camouflage IP, in fact, is the disguise HTTP header, so the general REMOTE_ADDR obtained through the IP is not camouflage,
Instead, those that get IP via HTTP headers (which can prevent proxies) can make their own IP.
about how to verify the code, simply say:
First use the normal browser, view the page, find the SessionID corresponding to the verification code,
Also note the SessionID and verification code values,
Next, use Snoopy to forge.
Principle: Because it is the same SessionID, the verification code obtained is the same as the first input.
4 Sometimes we may need to forge more things, Snoopy completely for us to think of
Copy CodeThe code is as follows:
$snoopy->proxy_host = "Http://www.jb51.net";
$snoopy->proxy_port = "8080"; Using proxies
$snoopy->maxredirs = 2; REDIRECT Times
$snoopy->expandlinks = true; Whether the complete link is often used when collecting
For example the link for/images/taoav.gif can be changed to its full link http://www.jb51.net/images/taoav.gif
$snoopy->maxframes = 5//maximum number of frames allowed
Note that when the frame is crawled $snoopy->results returns an array
$snoopy->error//Return error message
?>

http://www.bkjia.com/PHPjc/322586.html www.bkjia.com true http://www.bkjia.com/PHPjc/322586.html techarticle Download Address: http://www.jb51.net/codes/33397.html snoopy some features: 1 crawl the content of the Web page Fetch 2 Crawl the text content of the Web page (remove HTML tags) Fetchtext 3 crawl The chain of the web ... /c5>

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.