The idea of using PHP instead of JS to play the DOM and the example code _php technique

Source: Internet
Author: User
Tags xpath
The origin of the matter is relatively simple, I need to put a navigation page data to write to the database. A more intuitive approach is to analyze the HTML file, the common method is to use the regular expression of PHP to match. But this is difficult to develop and maintain, and the code is very readable.

Navigation page data are arranged in the DOM tree, with JS can use a few loops easy to operate, and JS need to rely on the browser, the operation of the database is very difficult. In fact, PHP has a ready-made class library to the DOM tree node for the increase in the check operation, to do some notes.

This involves 2 classes of DOMDocument and Domxpath.

In fact, the idea is more clear, that is, through the DOMDocument of an HTML file into the DOM tree data structure, and then use the example of Domxpath to search the DOM tree, to get a specific node, then the current node can traverse the subtree, get the desired results.

There is a navigation HTML file "./hao.html" in the current directory.

Now you need to get all the Chinese content of the <a> tags, the PHP code is as follows:
Copy Code code as follows:

<?php
Convert html/xml file to Dom tree
$dom = new DOMDocument ();
$dom->loadhtmlfile ("hao.html");

Get the DL label for fix for all class

Example 1:for everything with a ID
$elements = $xpath->query ("//*[@id]");

Example 2:for node data in a selected ID
$elements = $xpath->query ("/html/body/div[@id = ' yourtagidhere ']");

Example 3:same as above with wildcard
$elements = $xpath->query ("*/div[@id = ' yourtagidhere ']");
$xpath = new Domxpath ($dom);
$dls = $xpath->query ('//dl[@class = ' fix '] ');

foreach ($dls as $DL) {
$spans = $dl->childnodes;
foreach ($spans as $span) {
Echo Trim ($span->textcontent). " \ t ";
}
echo "\ n";
}
?>

The output results are as follows:

Note: It is worth noting that the default encoding of DOMDocument is Latin, so when handling UTF encoded Chinese, you need to follow the
Copy Code code as follows:

<meta http-equiv= "Content-type" content= "text/html; Charset=utf-8 ">

In other locations, or just write <meta content= "Charset=utf-8" > are not recognized OH

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.