Https://github.com/samacs/simple_html_dom
Snoopy is characterized by "big" and "full", a fetch of everything, can be taken as the first step in the collection. Next you need to use Simple_html_dom to carefully put the desired part, buckle out. Of course, if you are particularly good at regular, and you love the regular, you can also use regular to match the crawl.
Simple_html_dom is actually a DOM parsing process. PHP also provides a number of analytical methods, but this simple_html_dom can be said to be more professional, a class, to meet a lot of the features you want.
Create a target Document object, which is the target page, with a URL or file name
$html = file_get_html (");
$html = file_get_html (");
Use a string as a target page. You can get the page through Snoopy and then get it here to handle it.
$myhtml = str_get_html ('
Find all the pictures, return the array
foreach ($html->find (' img ') as $element)
Echo $element->src. ' <br> ';
Find all the Links
foreach ($html->find (' a ') as $element)
Echo $element->href. ' <br> ';
The Find method works well, usually it returns an array containing the object. When looking for a target element, you can get the target string by class or by ID, or by other attributes.
Through the class property of the target Div, find the second parameter in the Div,find method that is the first of the returned array. Starting from 0 is the first one
$target _div = $html->find (' Div.targetclass ', 0);
To see if the results are what you want, just echo it.
echo $target _div;
The key point is that after the acquisition object is created, be sure to destroy it, otherwise the PHP page may be "card" on the 30 seconds or so, depending on the time limit of your server. The method of destruction is:
$html->clear ();
Unset ($html);
I think Simple_html_dom is a better place is to control the collection as easy as JS. A manual in English is available in the download package provided below
Array $e->getallattributes () |
Array $e->attr |
String $e->getattribute ($name) |
String $e->attribute |
void $e->setattribute ($name, $value) |
void $value = $e->attribute |
BOOL $e->hasattribute ($name) |
BOOL Isset ($e->attribute) |
void $e->removeattribute ($name) |
void $e->attribute = null |
Element $e->getelementbyid ($id) |
Mixed $e->find ("# $id", 0) |
Mixed $e->getelementsbyid ($id [, $index]) |
Mixed $e->find ("# $id" [, int $index]) |
Element $e->getelementbytagname ($name) |
Mixed $e->find ($name, 0) |
Mixed $e->getelementsbytagname ($name [, $index]) |
Mixed $e->find ($name [, int $index]) |
Element $e->parentnode () |
Element $e->parent () |
Mixed $e->childnodes ([$index]) |
Mixed $e->children ([int $index]) |
Element $e->firstchild () |
Element $e->first_child () |
Element $e->lastchild () |
Element $e->last_child () |
Element $e->nextsibling () |
Element $e->next_sibling () |
Element $e->previoussibling () |
Element $e->prev_sibling () |
Simple_html_dom with Snoopy