Simpler, I need to organize the data of a navigation page to write to the database. A more intuitive approach is to parse the HTML file. The common approach is to match PHP's regular table-type. However, the development and maintenance of this is very difficult, code readability is very poor.
Navigation page data are the rules of the arrangement in the DOM tree, with JS can be used in a few loops easy to manipulate it, and JS need to rely on the browser, the operation of the database is very difficult. In fact, PHP has a ready-made class library to the DOM tree species node for the increase and deletion check operation, here to do some notes.
There are 2 classes of DOMDocument and Domxpath involved.
In fact, the idea is more clear, is to convert an HTML file into the data structure of the DOM tree by DOMDocument. Then use the Domxpath instance to search for the DOM tree, get the desired node, and then we can traverse the subtree of the current node to get the desired result.
Write one of the simplest demos
In the current folder there is a navigation HTML file such as "./hao.html"
watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqvdhvhbnr1yw5scw==/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/ Dissolve/70/gravity/center ">
Now we need to get all the Chinese content of <a> tags. PHP code such as the following:
<?php//Convert Html/xml file to dom tree $dom = new DOMDocument (), $dom->loadhtmlfile ("hao.html");//Get all Class fix DL tags// Example 1:for everything with an id//$elements = $xpath->query ("//*[@id]");//example 2:for node data in a selected I d//$elements = $xpath->query ("/html/body/div[@id = ' yourtagidhere ']");//Example 3:same as above with wildcard//$ elements = $xpath->query ("*/div[@id = ' yourtagidhere ']"), $xpath = new Domxpath ($dom); $dls = $xpath->query ('//dl[@ class= "fix"); foreach ($dls as $dl) { $spans = $dl->childnodes; foreach ($spans as $span) { echo trim ($span->textcontent). " \ t "; } echo "\ n";} ? >
The output results are as follows:
Note: It is important to note that the default encoding for DOMDocument is Latin, so when dealing with UTF-encoded Chinese, you need to fill in the back of the
<meta http-equiv="content-type" content="text/html; charset=utf-8">
In other places, or just write <meta content= "Charset=utf-8" > Oh, not recognized
Copyright notice: This article Bo Master original articles, blogs, without consent may not be reproduced.
Replace JS funny dom with PHP