Experts teach you how to use HTML Purifier to prevent bad code

Source: Internet
Author: User
Tags apache log file apache log

We need to pay attention to and avoid any security risks during design. This is especially true for website construction. A large number of websites cannot be completely separated from HTML code. How to ensure the security and legitimacy of such code input is a key step to avoid security issues. Today, I will share with you a good HTML Purifier tool that can help you prevent bad code. We can call it an HTML cleaner.

Functions and features

HTML Purifier helps users Ensure the legitimacy of HTML. It allows you to check whether HTML contains cross-site scripting attacks or other malicious attacks. With this software, you allow users to paste HTML content without inserting malicious code, which can be run in the browsers of anyone who views the HTML. We can use CodeIgniter, Drupal, MODx, Phorum, Joomla! And WordPress.

HTML Purifier uses a whitelist for security, requiring that all parts of a legal document be explicitly authorized, rather than looking for known malicious HTML code like a blacklist list. The smoke test page explicitly lists what is allowed, and shows the related scenarios. An important goal of HTML Purifier is to fully understand what the legal HTML is and which elements can be nested in other elements. For the HTML attributes of a specific element, which are valid content. In addition, the software also supports CSS. Compared with other HTML verification tools, the software has its own advantages.

However, users (you, huh, huh) need to note that there are no HTML Purifier packages for Linux systems such as Ubuntu, Fedora, and openSUSE. This software can be installed using pear, and the installation speed is very fast, so that we can use the upgrade of pear to easily upgrade to the latest version. Pear makes it easier to include HTML Purifier into your script because you do not have to specify any path in your script.

Installation and Use

To install HTML Purifier through pear, you must first install the php-pear package and then use the pear command to install HTML Purifier. The following command installs HTML Purifier into/usr/share/pear/HTMLPurifier.

pear channel-discover htmlpurifier.orgpear install hp/HTMLPurifier

For the author, using HTML Purifier in this phase will cause an Apache Log File error, that is, the Cache. SerializerPath path does not exist. The software will try to use the following directory as a Writable Path for high-speed buffer content:/usr/share/pear/HTMLPurifier/DefinitionCache/Serializer. This high-speed buffer storage can be disabled, which is described in detail in the INSTALL installation. Otherwise, you can create a directory in/usr, /usr is the location where HTML Purifier is used as a cache with easy content loss. The third option is displayed below:

# mkdir -p /var/cache/HTMLPurifier# chown apache /var/cache/HTMLPurifier# chmod o-rwx /var/cache/HTMLPurifier# ls -ld /var/cache/HTMLPurifierdrwxr-x--- 2 apache root 4096 2008-06-25 14:25 /var/cache/HTMLPurifier

Unfortunately, the default path of SerializerPath is encoded and placed in HTMLPurifier/ConfigSchema/schema. ser. This is a length-bound file and is not easy to edit. The best solution is to use a configuration object in the PHP code to change the path, or use a better method, that is, the user's PHP function, which can create configuration objects for the website.

The following is a simple index. php file, which uses HTML Purifier to clear HTML content and is submitted by a form on the same HTML webpage. Note that htmlspecialchars is not called for security, but only to support HTML text entered by users, so as to be completely visible within the pre element.

# cd /var/www/html # mkdir HTMLPurifierTest # chown ben.apache HTMLPurifierTest # chmod +s HTMLPurifierTest # su -l ben $ cd /var/www/html $ vi index.php 
     set(Core, Encoding, ISO-8859-1); $config->set(HTML, TidyLevel, heavy ); $config->set(Cache, SerializerPath, /var/cache/HTMLPurifier ); $ purifier = new HTMLPurifier($config); ?> < html>  

Enter yournastiest HTML below! < /p > < form name="myfrom" action="index.php" > < input type=text name=query>< /form>

If you want to explicitly restrict HTML elements that a user can enter, you can use the ForbiddenElements configuration command, as shown below. In this example, any uppercase, italic, or pre-formatted tags are stripped from the entered HTML element. You can also use another method to explicitly specify which elements are valid using AllowedElements in the whitelist.

 $config->set(HTML, ForbiddenElements, b,i,pre);

HTML Purifier provides support for filtering and managing Uris (unified resource identifiers), either before or after primary verification. Attackers can filter HTML input to allow users to change the URI.

There is a URI filter called the host blacklist, which allows you to block a given host name. However, you must be careful when using the host blacklist, because once anything in your blacklist appears in the URL (Uniform Resource Locator), it will be rejected. Fortunately, the host blacklist class code is very short, so you can easily define a class, which is only tested for URLs ending with a specific suffix.


The installation of high-speed cache directories is probably a PEAR limitation. At least making HTML Purifier unable to run normally and generating lengthy error messages will force us to pay attention to the problem of storing easily lost cache documents, instead of simply using a path under/usr.

HTML Purifier protects against incorrect or malicious HTML code input. The white list can be input and used in combination with the URI filtering function, which avoids attempts by some users to enter illegal data. URI filtering is a good function. If you allow anonymous posting by anyone as an administrator, it can help reduce the junk information of the Forum. For example, you can reinforce this policy: When people post anonymously, they can only connect to your website. If they want to link to other websites, they need to register.

In short, HTML Purifier is a practical tool. You may try it.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.