[Jsoup learning gift Note] eliminates untrusted HTML (to prevent XSS attacks) and jsoupxss
Problem
Users are often provided with comments when making websites. Some unfriendly users may make some scripts into the comments, which may damage the behavior of the entire page. More seriously, they need to obtain some confidential information and clear the HTML at this time, to avoid cross-site scripting (XSS) attacks ).
Method
Use jsoup HTMLCleaner
Method, but you must specify a configurableWhitelist
.
String unsafe = "<p><a href='http://example.com/' onclick='stealCookies()'>Link</a></p>";String safe = Jsoup.clean(unsafe, Whitelist.basic());// now: <p><a href="http://example.com/" rel="nofollow">Link</a></p>
Description
XSS, also known as CSS (Cross Site Script), is a Cross-Site scripting attack. A malicious attacker inserts malicious html code into a Web page. When a user browses this page, the html code embedded in the Web page is executed, this achieves the Special Purpose of malicious attacks to users. XSS is a passive attack, because it is passive and difficult to use, so many people often ignore its dangers. Therefore, we often only allow users to enter plain text content, but the user experience is poor.
A better solution is to use a Rich Text Editor WYSIWYG such as CKEditor and TinyMCE. These can be output in HTML and can be visually edited by users. Although they can perform verification on the client side, this is not safe enough. You need to verify and clear Harmful HTML code on the server side to ensure that the HTML entered to your website is safe. Otherwise, attackers can bypass Javascript verification on the client and inject insecure HMTL to your website.
Jsoup's whitelist cleaner can filter user input HTML on the server side and output only some secure labels and attributes.
Jsoup provides a seriesWhitelist
The basic configuration can meet most of the requirements, but can be modified if necessary, but be careful.
This cleaner is very useful, not only to avoid XSS attacks, but also to limit the range of tags that users can enter.
See
- See XSS cheat sheet for an example to see why regular expressions are not supported. Using a secure whitelist parser-based cleaner is the right choice.
- See
Cleaner
To learn how to returnDocument
Object, not a string
- See
Whitelist
To learn how to create a custom whitelist
- Learn more about nofollow Link Attributes