1. Garbled solution
There is no doubt that one came up to the problem of garbled, although I have as described in the document, all the characters using UTF-8 encoding:
$html = 'How are you doing
'; $dom = new DOMDocument (); @ $dom->loadhtml ($html); Echo $dom->documentelement->nodevalue;
However, if you change to:
$html = 'How are you doing
'; $dom = new DOMDocument (); @ $dom->loadxml ($html); Echo $dom->documentelement->nodevalue;
There's no problem. It was later discovered that Loadhtml would have relied on the declaration meta tag in HTML. If there is no such label, it is considered as iso-8859-1 character set, so garbled. To solve this, give the string a label like this in the header:
$meta = "
; @ $dom->loadhtml ($meta. $html);
2. Recursion
Html/xml is a recursive layout, so it is bound to iterate recursively:
function _pretty_html_node ($node) {//recursive termination premise//1. Xml_text_node//2. Xml_element_node//3. There is no child node foreach ($node->childnodes as $n) {$child _text. = _pretty_html_node ($n);}//Then different disposition of the differences in the label switch ($tag) {case ' a ': $href = $node->getattribute (' href '); $text. = "$child _text"; ... } return $text; }
3. Escape character Disposition penalty
For a text node, its nodevalue is escaped Htmlspeciachars (). As the html/xml is read, the text is reversed, such as > is already > in memory.
Download Source: pretty_html.php
Related Posts:
- C # version of SimpleXML
- Webpage garbled problem in the process of self-erecting Apache
- If-else the inverse of optimizing code redundancy
- WordPress Pagination Code
- Generate a pop-up window with JavaScript
The above describes the PHP simple Dom HTML parsing garbled, including the aspects of the content, I hope that the PHP tutorial interested in a friend helpful.