Replace JS funny dom with PHP

Source: Internet
Author: User
Tags xpath

Simpler, I need to organize the data of a navigation page to write to the database. A more intuitive approach is to parse the HTML file. The common approach is to match PHP's regular table-type. However, the development and maintenance of this is very difficult, code readability is very poor.

Navigation page data are the rules of the arrangement in the DOM tree, with JS can be used in a few loops easy to manipulate it, and JS need to rely on the browser, the operation of the database is very difficult. In fact, PHP has a ready-made class library to the DOM tree species node for the increase and deletion check operation, here to do some notes.

There are 2 classes of DOMDocument and Domxpath involved.

In fact, the idea is more clear, is to convert an HTML file into the data structure of the DOM tree by DOMDocument. Then use the Domxpath instance to search for the DOM tree, get the desired node, and then we can traverse the subtree of the current node to get the desired result.

Write one of the simplest demos

In the current folder there is a navigation HTML file such as "./hao.html"

watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqvdhvhbnr1yw5scw==/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/ Dissolve/70/gravity/center ">

Now we need to get all the Chinese content of <a> tags. PHP code such as the following:

<?php//Convert Html/xml file to dom tree $dom = new DOMDocument (), $dom->loadhtmlfile ("hao.html");//Get all Class fix DL tags// Example 1:for everything with an id//$elements = $xpath->query ("//*[@id]");//example 2:for node data in a selected I d//$elements = $xpath->query ("/html/body/div[@id = ' yourtagidhere ']");//Example 3:same as above with wildcard//$ elements = $xpath->query ("*/div[@id = ' yourtagidhere ']"), $xpath = new Domxpath ($dom); $dls = $xpath->query ('//dl[@ class= "fix"); foreach ($dls as $dl) {    $spans = $dl->childnodes;    foreach ($spans as $span) {        echo trim ($span->textcontent). " \ t ";    }    echo "\ n";}      ? >


The output results are as follows:


Note: It is important to note that the default encoding for DOMDocument is Latin, so when dealing with UTF-encoded Chinese, you need to fill in the back of the

<meta http-equiv="content-type" content="text/html; charset=utf-8">

In other places, or just write <meta content= "Charset=utf-8" > Oh, not recognized

Copyright notice: This article Bo Master original articles, blogs, without consent may not be reproduced.

Replace JS funny dom with PHP

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.