This article will focus on some principles of XSS attack defense. You need to understand the basic principles of XSS. If you are not clear about this, see these two articles: Stored and Reflected XSS Attack and DOM Based XSS.
Attackers can exploit the XSS vulnerability to send attack scripts to users. the user's browser still executes the script because it cannot be known to be untrusted. For the browser, it thinks that the script is from a trusted server, so the script can access the Cookie in a bright manner, or save the sensitive information used by the current website in the browser, you can even know the software installed on your computer. These scripts can also rewrite HTML pages for phishing attacks.
Although there are various causes for the XSS vulnerability, the exploitation of the vulnerability is also versatile. However, if we follow the defense principles mentioned in this article, we can still prevent XSS attacks.
Someone may ask, the core of XSS defense is not encoding when outputting untrusted data, but a popular Web framework (such as Rails) most of them use HTML encoding for untrusted data by default to help us defend against it. Do we still need to spend time studying how to defend against XSS? The answer is yes. for untrusted data that will be placed in the body of the HTML page, HTML encoding is enough to defend against XSS attacks, even placing HTML-encoded data in the attribute of an html tag does not generate an XSS Vulnerability (provided that these attributes use correct quotation marks). However, if you put the HTML-encoded data.
<Script>... Do not insert untrusted data directly here... </Script> insert it directly into the SCRIPT tag. <! -... Do not insert untrusted data directly here... -> Insert it into the HTML comment. <div should not directly insert untrusted data here = "…"> </Div> insert to the attribute name of the HTML tag <div name = "... Do not insert untrusted data directly here... "> </Div> insert it to the attribute value of the HTML Tag. <do not insert untrusted data directly here. href = "…"> </A> name of the HTML tag <style>... Do not insert untrusted data directly here... </Style> insert directly to CSS
Most importantly, do not introduce any untrusted third-party JavaScript into the page. Once it is introduced, these scripts can manipulate your HTML page, attackers can steal sensitive information or initiate phishing attacks.
Principle 2: HTML Entity encoding of untrusted data inserted between HTML tags
It is emphasized that the insertion of untrusted data between HTML tags is different from the insertion of untrusted data into the HTML Tag attribute, because the two require different types of encoding. When you do need to insert untrusted data between HTML tags, the first thing you need to do is to encode untrusted data in HTML Entity. For example, we often need to put user-submitted data into DIV, P, and TD tags, Which is untrusted and must be encoded in HTML Entity. Many Web frameworks provide HTML Entity-encoded functions. We only need to call these functions, while some Web frameworks seem to be more "intelligent", such as Rails, by default, it can encode HTML Entity for all the data inserted into the HTML page. Although it cannot completely defend against XSS, it actually reduces the burden on developers.
<Body>... HTML Entity encoding before inserting untrusted data... </Body> <div>... HTML Entity encoding before inserting untrusted data... </Div> <p>... HTML Entity encoding before inserting untrusted data... </P> Similarly, before inserting untrusted data between other HTML tags, encode the data in HTML Entity.
[Encoding Rules]
So what should we do with HTML Entity encoding? It needs to encode the following six special characters:
& –> &< –> <> –> >” –> "‘ –> '/ –> /
Note the following:
It is not recommended to encode single quotes (') as & apos; because it is not a standard HTML Tag.
You need to encode the slash (/), because when XSS attacks are performed, the slash (/) is very useful for disabling the current HTML Tag.
We recommend that you use the ESAPI function library provided by OWASP, which provides a series of very strict functions for various security coding. In the current example, you can use:
String encodedContent = ESAPI.encoder().encodeForHTML(request.getParameter(“input”));
Principle 3: HTML attribute encoding is performed on untrusted data inserted into HTML attributes.
This principle means that when you want to insert untrusted data to the value section (data value) of HTML attributes (such as width, name, and value, HTML attribute encoding should be performed on the data. However, this principle is not applicable when you want to insert data into the event processing attributes of HTML tags (such as onmouseover, the following principle 4 should be used for JavaScript encoding.
<Div attr =... Encode HTML attributes before inserting untrusted data...> </Div> the attribute value is not enclosed in quotation marks. <div attr = '... Perform HTML attribute encoding before inserting untrusted data... '> </Div> the attribute value section uses single quotation marks <div attr = "... Encode HTML attributes before inserting untrusted data... "> </Div> the attribute value section uses double quotation marks.
[Encoding Rules]
All other characters except Arabic numerals and letters are encoded as long as the ASCII code of the character is less than 256. The output format after encoding is & # xHH; (starting with & # x, HH refers to the hexadecimal number corresponding to this character, with a semicolon as the end character)
The encoding rules are so strict that developers sometimes forget to quote the attribute value. If the attribute value is not enclosed in quotation marks, attackers can easily close the current attribute and insert an attack script. For example, if the attribute does not use quotation marks and the data is not strictly encoded, a space character can close the current attribute. See the following Attack:
Suppose the HTML code is like this:
... Content...
Attackers can construct such input:
X onmouseover = "javascript: alert (/xss /)"
Finally, the final HTML code in the user's browser will look like this:
... Content...
As long as the user's mouse moves over the DIV, the attacker will be triggered to write an attack script. In this example, the script only pops up a warning box, and there is not much harm except a prank, but in actual attacks, attackers will use more destructive scripts, for example, the following XSS attack that steals user cookies:
X/>
In addition to space characters, these characters can also be:
% * +,-/; <=> ^ | '(Reverse single quotes, which IE considers as single quotes)
You can use functions provided by ESAPI to encode HTML attributes:
String encodedContent = ESAPI.encoder().encodeForHTMLAttribute(request.getParameter(“input”));
Principle 4: Perform SCRIPT encoding on untrusted data when it is inserted into the SCRIPT.
This principle mainly targets dynamically generated JavaScript code, including the script section and the Event processing attributes of HTML tags (Event Handler, such as onmouseover and onload ). When inserting data into JavaScript code, there is only one situation that is secure, that is, JavaScript encoding for untrusted data, and only put the data in the value section enclosed by quotation marks (data value). For example:
In addition, inserting untrusted data anywhere in JavaScript code is quite risky. Attackers can easily insert attack code.
<Script> alert ('... Perform JavaScript encoding before inserting untrusted data... ') </Script> the value section uses single quotes <script> x = "… JavaScript encoding before inserting untrusted data ..." </Script> the value section uses double quotation marks <div onmouseover = "x = '... Perform JavaScript encoding before inserting untrusted data... '</Div> uses quotation marks, and the value of the event processing attribute also uses quotation marks. Note that in XSS defense, some JavaScript Functions are extremely dangerous. Even if JavaScript code is performed on untrusted data, XSS vulnerabilities are still generated, such as: <script> window. setInterval ('... Even if JavaScript code is performed on untrusted data, there will still be an XSS vulnerability... '); </Script>
[Encoding Rules]
All other characters except Arabic numerals and letters are encoded as long as the ASCII code of the character is less than 256. The output format after encoding is \ xHH (starting with \ x, HH refers to the hexadecimal number corresponding to this character)
When you encode untrusted data, do not easily escape special characters by using a backslash (\). For example, convert double quotation marks \", this is not reliable, because when the browser parses the page, it will first parse HTML and then parse JavaScript, therefore, double quotation marks may be used as HTML characters for HTML Parsing. In this case, double quotation marks can break through the value of the Code, allowing attackers to continue XSS attacks. For example:
Assume that the code snippet is as follows:
<script>var message = ” $VAR “;</script>
The attacker entered the following content:
\”; alert(‘xss’);//
If you simply escape double quotes and replace them with \, the content entered by the attacker will change:
<script>var message = ” \\”; alert(‘xss’);// “;</script>
When parsing, the browser considers that the double quotation mark after the backslash matches the first double quotation mark, and then considers the subsequent alert ('xss') as a normal JavaScript script, therefore, execution is allowed.
You can use functions provided by ESAPI for JavaScript encoding:
String encodedContent = ESAPI.encoder().encodeForJavaScript(request.getParameter(“input”));
Principle 5: Perform CSS encoding on untrusted data when inserting it into the Style attribute.
When you want to insert untrusted data into Stylesheet, Style labels, or Style attributes, You need to perform CSS encoding on the data. In the traditional impression, CSS is only responsible for page styles, but in fact it is much more powerful than we think and can be used for various attacks. Therefore, do not take it lightly to store untrusted data in CSS. You should only allow the untrusted data to be placed in the value part of the CSS attribute and encode it properly. In addition, it is recommended that you do not place untrusted data in complex attributes, such as URLs and behavior. The JavaScript script can only be executed by the Expression attribute recognized by IE, therefore, we do not recommend placing untrusted data here.
<Style> selector {property :... Perform CSS encoding before inserting untrusted data ...} </Style> <style> selector {property :"... Perform CSS encoding before inserting untrusted data... "} </Style> <span style =" property :... Perform CSS encoding before inserting untrusted data... ">... </Span>
[Encoding Rules]
All other characters except Arabic numerals and letters are encoded as long as the ASCII code of the character is less than 256. The encoded output format is \ HH (starting with \, HH refers to the hexadecimal number corresponding to this character)
Same as Principle 2 and Principle 3. When encoding untrusted data, do not escape special characters such as double quotation marks. Attackers can try to bypass such restrictions.
You can use the functions provided by ESAPI for CSS encoding:
String encodedContent = ESAPI.encoder().encodeForCSS(request.getParameter(“input”));
Principle 6: perform URL encoding on untrusted data inserted into HTML URLs
When you need to insert untrusted data into the URL on the HTML page, you need to encode the data as follows:
[Encoding Rules]
All other characters except Arabic numerals and letters are encoded as long as the ASCII code of the character is less than 256. The encoded output format is % HH (starting with %, HH refers to the hexadecimal number corresponding to this character)
When encoding a URL, pay special attention to the following two points:
1) The URL attribute should be surrounded by quotation marks. Otherwise, attackers can easily break through the current attribute area and insert subsequent attack code.
2) Do not encode the entire URL, because untrusted data may be inserted into href, src, or other URL-based attributes, in this case, you need to verify the protocol fields of the starting part of the DATA. Otherwise, attackers can change the URL protocol, for example, from HTTP to DATA or javascript.
You can use the functions provided by ESAPI for URL encoding:
String encodedContent = ESAPI.encoder().encodeForURL(request.getParameter(“input”));
ESAPI also provides some functions for detecting untrusted data. Here we can use them to detect whether untrusted data is actually a URL:
String userProvidedURL = request.getParameter(“userProvidedURL”);boolean isValidURL = ESAPI.validator().isValidInput(“URLContext”, userProvidedURL, “URL”, 255, false);if (isValidURL) {<a href=”<%= encoder.encodeForHTMLAttribute(userProvidedURL) %>”></a>}
Principle 7: Use the XSS rule engine for encoding filtering when Rich Text is used
Web applications generally provide rich text input functions, such as BBS posting and blog writing. Rich Text Information submitted by users usually contains HTML tags or even JavaScript scripts, if appropriate encoding filtering is not performed, an XSS vulnerability is generated. However, we do not allow users to enter rich texts because we are afraid of generating XSS vulnerabilities, which is very harmful to user experience.
For the particularity of rich text, we can use the XSS rule engine to filter user input encoding, only allow users to enter Secure HTML tags, such as <B>, <I>, <p>. Other data is HTML encoded. It should be noted that the content filtered by the rules engine can only be placed in <div>, <p>, and other safe HTML tags. Do not place the content in the attribute values of HTML tags, do not place it in the HTML event processing attribute or in the <SCRIPT> tag.
Recommended XSS rule filtering engine: OWASP AntiSamp or Java HTML Sanitizer
Summary
XSS vulnerabilities may occur in many places, and the causes of vulnerabilities vary in each place. Therefore, for XSS defense, we need to do the right thing in the right place, that is, encoding is performed based on the places where untrusted data will be placed. For example, HTML encoding is required when placing the data between <div> labels, when put in the <div> label attribute, HTML attribute encoding is required, and so on.
XSS attacks are constantly evolving. The principles mentioned above cover almost all possible XSS scenarios in Web applications, but we still cannot take it lightly. In order to make Web applications more secure, we can also use other defense methods to enhance the XSS defense effect or reduce the loss:
Verify the validity of the data entered by the user. For example, you can only enter the correct email format in the text box of the input email. The text box of the input mobile phone number can only be filled with numbers and the format must be correct. This type of validity verification must at least be performed on the server side to prevent browser-side verification from being bypassed. In order to improve user experience and reduce server pressure, it is best to perform the same verification on the browser side.
Add the HttpOnly flag to the Cookie. Many XSS attacks aim to steal user cookies. These cookies often contain user identity authentication information such as SessionId. Once stolen, hackers can impersonate users to steal user accounts. Cookie Stealing generally relies on JavaScript to read the Cookie information. The HttpOnly flag tells the browser that the labeled Cookie cannot be read or modified by any script, in this way, even if Web applications have XSS vulnerabilities, Cookie information can be well protected to reduce losses.
Web applications are becoming more and more complex and prone to various vulnerabilities, not just XSS vulnerabilities. Without silver bullet, we can solve all security problems at once, defend against different security vulnerabilities.
I hope the principles described in this article can help you successfully defend against XSS attacks. If you have any opinions or questions about XSS attacks or defenses, please leave a message to discuss them. Thank you.