Parsing HTML using Symfony's crawler component in Laravel

Source: Internet
Author: User
Tags xpath

Crawler is the meaning of reptiles in English , read "Cry" ... -_-!


Recently in the use of laravel to write a crawl web system, before using the Simple_html_dom to parse HTML, since the use of Laravel natural to use the Composer Toolkit to achieve the function to appear tall on ...


Off-topic, simple_html_dom seems to be able to install with composer, but because the code is not supported by the PSR Coding specification, especially AutoLoad, which is vendor code structure, On GitHub, there is a support for the PSR Specification improvement Sunra/php-simple-html-dom-parser should not have been written by the original author.


The crawler full name is Domcrawler, which is the component of the Symfony framework. Heinous is Domcrawler no Chinese documents, Symfony also did not translate this part, so use domcrawler development can only 1.1 points groping, now will use the process of experience summary.


The first is the installation

Composer require symfony/dom-crawlercomposer require symfony/css-selector

Css-seelctor is a CSS selector, some functions use CSS to select nodes


The example used in the manual is

Use Symfony\component\domcrawler\crawler; $html = <<< ' html ' <!                DOCTYPE html>

The result of the printing is

String ' HTML ' (length=4)

Because this HTML code nodename is HTML, English is not good, began to use the time thought the program is wrong ...


The actual use of the process, if the new Crawler ($html) will be garbled problem, should be related to page encoding, so you can use the following way, first initialize the Crawler, and then add node

$crawler = new crawler (); $crawler->addhtmlcontent ($html);

The second parameter of Addhtmlcontent is CharSet, which is utf-8 by default.


Other examples can be found in the official documentation, http://symfony.com/doc/current/components/dom_crawler.html


Record a little bit of work and try to use it.


Filterxpath (String $xpath) method, according to the manual, the parameter of the method is $xpath, often used is p,div and other blocks.

echo $crawler->filterxpath ('//body/p ')->text (), Echo $crawler->filterxpath ('//body/p ')->last () Text ();

The output is the text of the first and next P-tag blocks

Var_dump ($crawler->filterxpath ('//body ')->html ());

HTML in the output body


foreach ($crawler->filterxpath ('//body/p ') as $i = = $node) {$c = new crawler ($node); echo $c->filter (' P ')->text ();}

Filterxpath gets an array of domelement blocks, each domelement block can continue parsing with the new crawler object


$nodeValues = $crawler->filterxpath ('//body/p ')->each (function (crawler $node, $i) {return $node->text ();});

Crawler provides each loop, using the closure function to simplify the code, but note that this notation $nodevalues the array and needs to be further processed.


Other usage

echo $crawler->filterxpath ('//body/p ')->attr (' class ');

You can get the value of the first P tag corresponding to the class property "message"

$crawler->filterxpath ('//div[@class = "style"] ')->filter (' A ')->attr (' href '); $crawler->filterxpath ('// div[@class = "Style"]->filter (' a>img ')->extract (Array (' Alt ', ' href '))

These are some of the ways to get tag properties


Filter and Filterxpath are different, the manual is written CSS selectors, not quite understand, I understand that the DIV is the XPath node contains elements, the situation also needs to be in the actual development to try.


Generally feel domcrawler than simple HTML dom useful, perhaps I use the more obvious.


The above is just the basic function of Crawler, please refer to the Symfony manual for the function of crawler part.

Http://api.symfony.com/3.2/Symfony/Component/DomCrawler/Crawler.html

Crawler The main problem or example is too few, the function manual is not used in the example, only in the actual use to explore ....


Symfony a few examples of domcrawler documents

Http://symfony.com/doc/current/components/dom_crawler.html


This article is from the "Remember Something" blog, please be sure to keep this source http://daweilang.blog.51cto.com/9806748/1885807

Parsing HTML using Symfony's crawler component in Laravel

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.