Python Rich Text XSS filter instance application

Source: Internet
Author: User
Tags html tags php and roundcube webmail

Python web development has become one of the mainstream today, but some of the relevant Third-party modules and libraries are not PHP and node.js many.

For example, the XSS filter component, PHP under the famous "HTML purifier" (http://htmlpurifier.org/), as well as the non-well-known filter components "xsshtml" (http://phith0n.github.io/XssHtml )

Python's Pip can also install a library called "Html-purifier", but this purifier and PHP are very different. This library is responsible for filtering out tags and attributes in HTML that are not in the whitelist.

Note that he is not filtering XSS, just filtering out labels and attributes that are not in the whitelist. In other words, such javascript is not filtered.

So I had to develop a Python XSS filter that I used in my own future Python project.

Talk about the specific implementation principle.

First, parsing HTML

Parsing HTML, using the Htmlparser class from Python. In the Python2, the name is Htmlparser, in the Python3 called Html.parser.

Using Htmlparser, you need your own class to inherit Htmlparser, and implement Handle_starttag, Handle_startendtag, Handle_endtag, Handle_data, and so on.

such as the Handle_starttag method, is called when entering a label. We can implement this method when we can get the tag tag that is being processed at this time, and all attribute attrs.

We can check whether the tag, Attrs is in the whitelist, and some special tags and attributes for special treatment, as follows:

Ii. Special handling of links

Some properties are possible to execute JavaScript code with JavaScript pseudo protocol, such as the href,embed src of a, so special processing is needed: to determine whether to start with http|https|ftp://, and if not, force the front plus HTTP ://

In this way, against the potential XSS injection.

III. special treatment of embed

Embed is a label embedded in a media file such as SWF, and in theory sometimes our rich-text editor is allowed to insert flash. However, we need to ensure that no JavaScript code can be executed in flash, and that he cannot send out some HTTP requests (which can easily cause csrf attacks).

So force to set the allowscriptaccess=never,allownetworking=none of the Embed label:

Four, when splicing tags and attributes, to prevent double quotes, become a new label

I have found an XSS Vulnerability (cve-2015-1433) in Roundcube webmail because the white list has been detected and then spliced HTML tags and attributes without filtering the double quotes, causing the property value to become a new property name, resulting in XSS.

So I'm using self.__htmlspecialchars to handle attribute values to prevent the more:

Finally, this module is also more convenient to use, the simplest demo is as follows:

Import Pxfilter

Parser = Pxfilter. Xsshtml ()

Parser.feed (")"

Parser.close ()

html = parser.gethtml ()

Print HTML

And then according to the instructions in the source code to modify it. GitHub Project Address: Https://github.com/phith0n/python-xss-filter

Oneself use web.py to build a demo, welcome test, Submit issues:http://python-xss-filter.leavesongs.com/, function, safety all still need everybody to give some suggestion.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.