php+tidy-Perfect XHTML error correction + filtering _php Tutorial

Source: Internet
Author: User
Tags html cleanup tidy word wrap
Input and output
Input and output should be said to be the basic function of many websites. User input data, Web site output data for other people to browse.

Take the current popular blog as an example, here the input and output is the author of the article after the creation of the blog post page for others to read.
The problem here is that user input is usually not controlled, it may contain incorrect formatting or contains code with security implications, and the content of the final site output must be the correct HTML code. This requires error correction and filtering of what the user has entered.

Never trust the user's input
You might say: Now there are all the WYSIWYG editors, FCKeditor, TinyMCE ... You might cite a whole bunch of them. Yes, they can all automatically generate standard XHTML code, but as a web developer, you must have heard "never trust the data submitted by users."

Therefore, it is necessary to correct and filter the user input data.

Need for better error correction and filtering
So far I have not seen a satisfactory implementation of the relevant, can be exposed to the low efficiency, the effect is not ideal, there are such obvious flaws. To give a more well-known example: WordPress is a very broad use of the blog system, easy to operate powerful and rich plug-in support, but it integrates tinymce and backstage a bunch of clever error correction filter code is quite a headache, half-angle character forced replacement, Overly conservative substitution rules and so on ..... It can be difficult to make a piece of code like this to show it correctly.

Here by the way, this blog is to use WordPress rack, in order to make this article can correctly display code, online search a lot of also tried some plug-ins, and eventually turned over its code to some of the filter rules commented out to barely show a decent bit-.-b

Of course, I do not want to blame it too much (WordPress), just want to show that it can also do better.

What is tidy and how does it work?
An excerpt from tidy manpage describes this:

Tidy reads HTML, XHTML and XML files and writes cleaned up markup. For HTML variants, it detects and corrects many common coding errors and strives to produce visually equivalent markup tha T is both the compliant and works on the most browsers. A common use of Tidy was to convert plain HTML to XHTML. For generic XML files, Tidy are limited to correcting basic well-formedness errors and pretty printing.

The simple tidy is to clean up the HTML code, generate a clean and standard-compliant HTML code, support Html,xhtml,xml. Tidy provides a library tidylib to make it easy to leverage the power of tidy in other applications. Fortunately, PHP has the appropriate tidy module to use.

Dude, why is PHP again?
Well, the question ... Ashamed, because I only get that little bit of PHP-.-V
But fortunately, I am not talking about the pure code here, at least there are some analysis of the process, to share these things more useful than the paste code.

Using Tidy in PHP
To use tidy in PHP need to install tidy module, that is, load tidy.so this PHP extension, the specific process is slightly, purely manual work. Finally you can see "Tidy support Enabled" in Phpinfo () ok.

With the support of this module, almost all the functions provided by tidy can be used in PHP. Common HTML cleanup is an incredibly easy thing to do, and even a parse tree of documents can be generated, as is the case with the client manipulating the DOM, the individual node of the HTML. There will be specific code descriptions below, or you can look at the official PHP manuals.

Php+tidy implementation of error correction and filtering
Above said so many background material, seems to be too Luo, the specific problem-solving code is the most direct.

1. Simple error-correcting implementation

function Htmlfix ($html)
{

if (!function_exists (' tidy_repair_string '))
return $html;
Use tidy to repair HTML code

Repair
$str = tidy_repair_string ($html,
Array (' output-xhtml ' =>true),
' UTF8 ');
Parse
$str = tidy_parse_string ($str,
Array (' output-xhtml ' =>true),
' UTF8 ');
$s = ";

$nodes = @tidy_get_body ($str)->child;

if (!is_array ($nodes)) {
$returnVal = 0;
return $s;
}

foreach ($nodes as $n) {
$s. = $n->value;
}
return $s;
}
The code above is to clean up the XHTML code that might not be canonical, and output the standard XHTML code (the input and output are UTF-8 encoded). The implementation code is not the most streamlined, because I write as carefully as possible to match the filtering features below.

2. Advanced implementation: Error correction + filtering

Function:

XHTML error correction, output standard XHTML code.
Filters unsafe code but does not affect content presentation, but cleans up unsafe code in Style/javascript.
For extra-long string insertions tags for browser-compatible word wrap, which can be used to refer to the line breaking of long text in Web pages.
function Htmlfixsafe ($html)
{

if (!function_exists (' tidy_repair_string '))
return $html;
Use tidy to repair HTML code

Parameter setting of tidy
$conf = Array (
' Output-xhtml ' =>true
, ' Drop-empty-paras ' =>false
, ' join-classes ' =>true
, ' Show-body-only ' =>true
);

Repair
$str = tidy_repair_string ($html, $conf, ' UTF8 ');
Generate Parse Tree
$str = tidy_parse_string ($str, $conf, ' UTF8 ');

$s = ";

Get the Body node
$body = @tidy_get_body ($STR);

function _dumpnode, check each node, filter output
function _dumpnode ($node,& $s) {

View the node name, if it is

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.