Some research on using regular expressions to filter scripts (Asp.net + C #)
Source: Internet
Author: User
When creating some websites (especially BBS and so on), users are often required to enter HTML-style code, but scripts are not allowed to run, in order to enrich the webpage style, malicious code execution is prohibited.
Of course, the htmlencode and htmldecode methods cannot be used, because the basic HTML code cannot be connected.
I did not find a good solution for searching on the Internet, but I collected some examples of script Attacks:
1. <SCRIPT> mark the Code contained in 2. Code in <a href = javascript :... 3. Code in the on... event of other basic controls 4. attacks caused by loading other pages in IFRAME and frameset With these materials, things are much simpler. Write a simple method and replace the above Code with the regular expression:
Public String wipescript (string HTML)
{
System. text. regularexpressions. regEx regex1 = new system. text. regularexpressions. regEx (@ "<SCRIPT [/S] + </script *>", system. text. regularexpressions. regexoptions. ignorecase );
System. text. regularexpressions. regEx regex2 = new system. text. regularexpressions. regEx (@ "href * = * [/S] * script *:", system. text. regularexpressions. regexoptions. ignorecase );
System. text. regularexpressions. regEx regex3 = new system. text. regularexpressions. regEx (@ "on [/S] * =", system. text. regularexpressions. regexoptions. ignorecase );
System. text. regularexpressions. regEx regex4 = new system. text. regularexpressions. regEx (@ "<IFRAME [/S] + </iframe *>", system. text. regularexpressions. regexoptions. ignorecase );
System. text. regularexpressions. regEx regex5 = new system. text. regularexpressions. regEx (@ "<frameset [/S] + </frameset *>", system. text. regularexpressions. regexoptions. ignorecase );
Html = regex1.replace (HTML, ""); // filter <SCRIPT> </SCRIPT> tags
Html = regex2.replace (HTML, ""); // filter href = javascript: (<A>) attributes
Html = regex3.replace (HTML, "_ disibledevent ="); // filter events of other controls.
Html = regex4.replace (HTML, ""); // filter IFRAME
Html = regex5.replace (HTML, ""); // filter frameset
Return HTML;
}
This method may contain the HTML code of the script, and The Returned Code is the clean code.
I have done some simple tests and can meet the requirements, but I still have a few questions:
Are there other script attacks? Is there any other better solution?
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.