Simple_html_dom with Snoopy

Source: Internet
Author: User

Https://github.com/samacs/simple_html_dom

Snoopy is characterized by "big" and "full", a fetch of everything, can be taken as the first step in the collection. Next you need to use Simple_html_dom to carefully put the desired part, buckle out. Of course, if you are particularly good at regular, and you love the regular, you can also use regular to match the crawl.

Simple_html_dom is actually a DOM parsing process. PHP also provides a number of analytical methods, but this simple_html_dom can be said to be more professional, a class, to meet a lot of the features you want.

Create a target Document object, which is the target page, with a URL or file name
$html = file_get_html (");
$html = file_get_html (");

Use a string as a target page. You can get the page through Snoopy and then get it here to handle it.
$myhtml = str_get_html ('

Find all the pictures, return the array
foreach ($html->find (' img ') as $element)
Echo $element->src. ' <br> ';

Find all the Links
foreach ($html->find (' a ') as $element)
Echo $element->href. ' <br> ';

The Find method works well, usually it returns an array containing the object. When looking for a target element, you can get the target string by class or by ID, or by other attributes.

Through the class property of the target Div, find the second parameter in the Div,find method that is the first of the returned array. Starting from 0 is the first one
$target _div = $html->find (' Div.targetclass ', 0);

To see if the results are what you want, just echo it.
echo $target _div;

The key point is that after the acquisition object is created, be sure to destroy it, otherwise the PHP page may be "card" on the 30 seconds or so, depending on the time limit of your server. The method of destruction is:
$html->clear ();
Unset ($html);

I think Simple_html_dom is a better place is to control the collection as easy as JS. A manual in English is available in the download package provided below

Array $e->getallattributes () Array $e->attr
String $e->getattribute ($name) String $e->attribute
void $e->setattribute ($name, $value) void $value = $e->attribute
BOOL $e->hasattribute ($name) BOOL Isset ($e->attribute)
void $e->removeattribute ($name) void $e->attribute = null
Element $e->getelementbyid ($id) Mixed $e->find ("# $id", 0)
Mixed $e->getelementsbyid ($id [, $index]) Mixed $e->find ("# $id" [, int $index])
Element $e->getelementbytagname ($name) Mixed $e->find ($name, 0)
Mixed $e->getelementsbytagname ($name [, $index]) Mixed $e->find ($name [, int $index])
Element $e->parentnode () Element $e->parent ()
Mixed $e->childnodes ([$index]) Mixed $e->children ([int $index])
Element $e->firstchild () Element $e->first_child ()
Element $e->lastchild () Element $e->last_child ()
Element $e->nextsibling () Element $e->next_sibling ()
Element $e->previoussibling () Element $e->prev_sibling ()

Simple_html_dom with Snoopy

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.