Snoopyis a PHP collection class that simulates a browser's access to Web content and the sending of a form.
Here are some Snoopy features:
Easy to crawl Web content
Easy to crawl page text (remove HTML tags)
Easy to crawl links within the web
Support for agent crawling
Support Basic user name, password authentication
Support for setting user-agent,referer,cookies and header content
Supports browser steering, and controls steering depth
The ability to translate links from pages into high-quality links
Easy to submit data and get return value
Ability to track HTML frames
Pass cookies when redirecting is supported
Snoopy class, Method:
Fetch ($URI)
The method used to crawl the contents of a Web page. The $URI parameter is the URL address of the crawled Web page. The results of the fetch are stored in the $this->results. If you are crawling a frame, Snoopy will track each frame back into the array and deposit it into the $this->results.
Fetchtext ($URI)
This method is similar to fetch (), except that this method removes HTML tags and other unrelated data, returning only the text content in the page.
Fetchform ($URI)
This method is similar to fetch (), except that this method removes HTML tags and other unrelated data and returns only the form content (form) in the Web page.
Fetchlinks ($URI)
This method is similar to fetch (), except that this method removes HTML tags and other unrelated data and only returns links to the Web page. By default, relative links are automatically completed and converted to full URLs.
Submit ($URI, $formvars)
This method sends a confirmation form to the link address specified by the. $formvars is an array of stored form parameters.
Submittext ($URI, $formvars)
This method is similar to submit (), the only difference is that this method will remove the HTML tags and other unrelated data, only return to the page after landing text content.
Submitlinks ($URI)
This method is similar to submit (), the only difference is that this method will remove the HTML tags and other unrelated data, only return the link in the Web page. By default, relative links are automatically completed and converted to full URLs.
Class Properties: (default value in parentheses)
$host a connected host
$port Connected Ports
$proxy proxy host used by _host, if any
$proxy the proxy host port used by the _port, if any
$agent User Agent Spoofing (Snoopy v0.1)
$referer Route information, if any
$cookies, if you have one.
$rawheaders Other header information, if any
$maxredirs maximum redirects, 0 = not allowed (5)
$offsiteok whether or not to allow redirects off-site. (true)
$expandlinks whether to complete the link with full address (true)
$user authenticated user name, if any
$pass authenticated user name, if any
$accept http Accept type (image/gif, Image/x-xbitmap, Image/jpeg, Image/pjpeg, */*)
$error where to error, if any.
$response _code Response code returned from the server
$headers header information returned from the server
$maxlength Longest return data length
$read _timeout Read operation timeout (requires PHP 4 Beta 4+) set to 0 for no timeout
$timed _out If a read operation times out, this property returns True (Requires PHP 4 Beta 4+)
Maximum number of frames $maxframes allowed to track
$status the state of the crawled HTTP
$temp Temporary file directory (/tmp) that the _dir Web server can write to
$curl _path Curl Binary directory, if no curl binary is set to False
Snoopy official website: http://sourceforge.net/projects/snoopy/
I hope this article to the vast number of PHP developers to help, thank you for reading this article. MorePHP Technical IssuesWelcome to Dabigatran Discussion:256271784, Verification Code:Cxy, do not write verification does not pass yo ~
|