This article mainly describes the use of symfony in the laravel of the crawler component analysis HTML, the need for friends can refer to the following
The crawler full name is Domcrawler, which is the component of the Symfony framework. Heinous is Domcrawler no Chinese documents, Symfony also did not translate this part, so use domcrawler development can only 1.1 points groping, now will use the process of experience summary.
The first is the installation
Composer require symfony/dom-crawlercomposer require symfony/css-selector
Css-seelctor is a CSS selector, some functions use CSS to select nodes
The example used in the manual is
Use Symfony\component\domcrawler\crawler; $html = <<< ' html ' Hello world! Hello crawler! HTML, $crawler = new Crawler ($html), foreach ($crawler as $domElement) {var_dump ($domElement->nodename);}
The result of the printing is
String ' HTML ' (length=4)
Because this HTML code nodename is HTML, English is not good, began to use the time thought the program is wrong ...
The actual use of the process, if the new Crawler ($html) will be garbled problem, should be related to page encoding, so you can use the following way, first initialize the Crawler, and then add node
$crawler = new crawler (); $crawler->addhtmlcontent ($html);
The second parameter of Addhtmlcontent is CharSet, which is utf-8 by default.
Other examples can be found in the official documentation, http://symfony.com/doc/current/components/dom_crawler.html
Record a little bit of work and try to use it.
Filterxpath (String $xpath) method, according to the manual, the parameter of the method is $xpath, often used is p,p and other blocks.
echo $crawler->filterxpath ('//body/p ')->text (), Echo $crawler->filterxpath ('//body/p ')->last () Text ();
The output is the text of the first and next P-tag blocks
Var_dump ($crawler->filterxpath ('//body ')->html ());
HTML in the output body
foreach ($crawler->filterxpath ('//body/p ') as $i = + $node) {$c = new crawler ($node); Echo $c->filter (' P ') Text ();}
Filterxpath gets an array of domelement blocks, each domelement block can continue parsing with the new crawler object
$nodeValues = $crawler->filterxpath ('//body/p ')->each (function (crawler $node, $i) {return $node->text ();});
Crawler provides each loop, using the closure function to simplify the code, but note that this notation $nodevalues the array and needs to be further processed.
Other usage
echo $crawler->filterxpath ('//body/p ')->attr (' class ');
You can get the value of the first P tag corresponding to the class property "message"
$crawler->filterxpath ('//p[@class = "style"] ')->filter (' A ')->attr (' href '); $crawler->filterxpath ('//p[@ Class= "Style"]->filter (' a>img ')->extract (Array (' Alt ', ' href '))
These are some of the ways to get tag properties
Filter and Filterxpath are different, the manual is written CSS selectors, not quite understand, I understand that the XPath node contains the elements, the situation needs to be in the actual development to try.
Generally feel domcrawler than simple HTML dom useful, perhaps I use the more obvious.
The above is just the basic function of Crawler, please refer to the Symfony manual for the function of crawler part.
Http://api.symfony.com/3.2/Symfony/Component/DomCrawler/Crawler.html
Crawler The main problem or example is too few, the function manual is not used in the example, only in the actual use to explore ....