Recently, we have been studying the attack and defense of XSS, especially Dom XSS, and the problem is slowly migrating to the browser encoding and decoding order.
Today was put pigeons, helpless in KFC looked at two hours of information, suddenly had a sense of enlightened.
References are posted first:
1. http://www.freebuf.com/articles/web/43285.html
2. http://www.freebuf.com/articles/web/10121.html
3. http://www.wooyun.org/whitehats/%E5%BF%83%E4%BC%A4%E7%9A%84%E7%98%A6%E5%AD%90
4. Doug: White hat speaks web security
5. Encoding Conversion Tool: http://app.baidu.com/app/enter?appid=280383
Usually in XSS defense, the output is the most common in HTML tags or HTML attributes, the browser will only HTML decoding, only need to HTML encoding output can solve the problem of XSS. However, if the output is involved in a script or URL, it is necessary to encode it in other ways. The focus of this paper is to study the coding problem of output variables in complex environment.
"Basic knowledge----common coding and causes"
HTML encoding
When rendering HTML pages, it is sometimes necessary to display special characters such as "<" and "&" because they are HTML-specific characters that need to be implemented in a certain way, and HTML encoding is born. HTML encoding is just a function representation that converts characters into HTML entities. For example, to display <script>, you need to write "<script>" in your code. To prevent XSS, at least convert the following characters:
Character |
HTML entities |
< |
< |
> |
> |
‘ |
& #039; |
“ |
" |
& |
& |
At the same time, character encoding is a form of implementing HTML encoding. Decimal, hexadecimal ASCII, or Unicode character encoding with a style of "& #数值;". For example, to display <script>, you can write "& #x003c;script& #x003e in your code;" or "& #60;script& #62;".
JavaScript encoding
Javascriptencode can be encoded in a way that is different from HTMLEncode, that is, using "\" to escape special characters. It can also be converted to the corresponding character encoding. JS provides a strategy for four character encodings:
1, three octal digits, if not enough number, the front 0, for example "E" coded as "\145"
2, two hexadecimal digits, if not enough, the front 0, for example "e" encoded as "\x65"
3, four hexadecimal digits, if not enough, the front 0, for example "e" encoded as "\u0065"
4. For some control characters, use special C-type escape styles (e.g. \ n and \ r)
URL encoding
The URL parameter string uses the Key=value key value pair to pass the parameter, separating the key-value pairs with a & symbol, such as/s?q=abc&ie=utf-8. If your value string contains = or &, then it is bound to cause the server parsing error to receive the URL, so the ambiguous & and = symbol must be escaped, that is, encoded. Character encoding is: The% number is followed by the hexadecimal of the character to replace these conflicting characters, such as replacing a space with%20.
"Browser parsing principle"
When the browser receives an HTML file, it parses the document from the beginning. When JavaScript is encountered, the JavaScript parser is invoked for parsing. Code that executes like an onclick, such as a trigger, is skipped and parsed when the event is triggered.
"Instance One"
<a href= "#" onclick= "{$value}" > Nice weather </a>
When the browser resolves, the contents of the onclick are parsed as HTML. After clicking on the link, call the JavaScript parser to parse the $value.
So the order of decoding: HTML decoding->javascript decoding.
So the correct defense strategy is: JavaScript encoding->html encoding.
"Example Two"
<div id= "BB" ></div>
<script>
document.getElementById (' BB '). innerhtml= "{$value}";
</script>
When the browser resolves, the $value is in JavaScript and is first decoded by JavaScript. After $value is assigned to HTML, HTML decoding is performed.
So the decoding sequence is: JavaScript decoding->html decoding
So the correct defense strategy is: HTML Encoding->javascript encoding
"Example three"
<TD onclick= "OpenUrl (add.do?username= ' {$value} ');" >11</td>
When the browser resolves, the $value is in JavaScript, but because it is in the onclick, it is first decoded as HTML. After being clicked, the JavaScript is decoded first. Because $value is still part of the URL, it is also decoded by the URL.
So the decoding order is: HTML decoding->javascript decoding->url decoding
So the correct defense strategy is: URL encoding->javascript encoding->html encoding
"Write at the end"
In the case of unreasonable coding in the context of the bypass method is clearly written in the resources, this article just to tidy up the idea ha ~
Trying to dig holes these days. ~\ (^o^)/~
"Web Security" second bomb: Compound coding problem in XSS attack and defense