0x00 Preface
At the beginning of the article, I 'd like to apologize for the article (about the inheritance of cross-origin character sets) that was recently published with an outrageous conclusion but was deleted in a timely manner. I also hope that those who will repost the article without testing can delete the article. Prevent more people from misunderstanding the cross-origin character set. However, I have reorganized an article for my gratitude, which I have always wanted to write. However, I think this question is still a little big for me, so I have never been able to answer it. However, the bid that has been blown over is always to be fulfilled sooner or later. It's time to die early, so write it with your scalp.>.<
. Let's use one by one examples to enter the world created by XSS and character set.
0x01 XSS Based on UTF-7
Before getting started, let's give a brief introduction to the UTF-7. A UTF-7 is a character set that can represent all unicode through 7bit. Most of them were used in the early days in the mail environment, but have now been removed from the Unicode specification. This character set is used to represent all texts in 7bit mode. Apart from numbers and symbols in some parts, other parts are displayed in base64 encoding based mode. For example:
1 |
< div > I went here! </ div > |
In UTF-7, is:
+ADw-div+AD4- +YhFOhk4qU7v/AQ- +Adw-/div+AD4-
Similarly,
<script> alert("xss") </script>
It will become:
+ADw-script+AD4- alert(+ACI-xss+ACI-) +ADw-/script+AD4-
From the above example, it is not difficult to see that our code does not show the form we expect” <”,”>”
Or double quotation marks. But how can we associate this situation with XSS? The general situation can be divided into three categories:
(1) We didn't set the character set through the Response header or Meta tag.
One case is that the IE encoding is set to automatic detection, IE will be based on some BOM characters, such as + ADw-to determine the current page Encoding As a UTF-7 (it is not applicable now ).
Another case is that although IE does not check the setting of the automatic detection character set, we can make a page for character set as UTF-7 and include our target page through Iframe, character Set settings are implemented through the character set inheritance vulnerability.
<meta http-equiv='content-type' content='text/html;charset=UTF-7'><iframe src='http://example.com/target.html'></iframe>
However, unfortunately, this cross-domain Character Set inheritance vulnerability based on iframe is no longer available. MK corrected a person's mistake in Slide not long ago.
If you set the character set through iframe in your Slide and inherit the character set of top frame, this problem does not exist now. Because the premise of this inheritance is that it must be in the same domain .』
(2) We have set an unrecognized character set.
In fact, in a deleted article (the author's test method is incorrect and the conclusion is incorrect, so he consciously proposes to delete it),/fd said that utf8 is also a standard. But it was not a standard. But why are UTF-8 and utf8 both standard? Because there will always be careless people making such errors, such:
Write the UTF-8 as UTF8 write the EUC-JP as EUC
In the past, this setting method was not recognized by the browser. In other words, it is the same as not configuring character sets. For more information, see the first method.
(3) Output points
Before the tag, and the character set is specified by the meta tag
The approximate scenario can be like this (the output point is in the title, before the meta ):
<Html>
The BOM and iframe-based character set inheritance cannot be used now. So in case 3, we can consider inserting
</title><meta charset=utf-7>
0x02 XSS Based on US-ASCII
In fact, US-ASCII-based XSS and UTF-7-based XSS has a lot of similarities. It also uses a 7bit character set to represent numbers and minority characters. It can be used to indicate the 128 types of text from 0x00 to 0x7F. But if you try to use Internet Explorer to open a document that is described through a US-ASCII, you will find that this character set will not only parse text from 0x00 to 0x7F. Even if the characters in the range 0x80 to 0xFF cannot be expressed by 7bit, some and 0x00 ~ characters are generated by ignoring the top bit method ~ 0x7F equivalent character.
That is to say, in this character set:
Double quotation marks 0x22 are equivalent to 0xA2, left angle brackets 0x3C, and 0xBC, and right angle brackets 0x3E are equivalent to 0xBE.
For example, you save the following section as html, and select shift_jis for encoding (if it is notepad, ANSI can be used)
Note:In shift_JIS, the values of begin and begin are 0xBC and 0xBE, respectively.
Then, use Internet Explorer to open it, and then you will see a small window pop up.
0x03 use character sets to bypass the htmlspecialchars () function
After reading the previous two character sets, we will find that one of the two character sets has nothing in common.”<”,”>”
Or double quotation marks. This makes us think of htmlspecialchars () in PHP. Although the iframe cross-origin character set inheritance vulnerability mentioned later cannot be reused, I will propose a specific environment where the vulnerability can be reproduced in the following section.
This is an XSS hackme challenge solution mentioned in kotowicz's blog in 2010. The main purpose of challenge is to bypass the htmlspecialchars () function to implement XSS. What's more, this page does not use the response header or meta to set the character set.
The author mentioned at the beginning that a POC can be made as follows:
In the age of IE6, the problem of Character Set inheritance was rampant. We only needed an iframe to complete this challenge (I did not reproduce it in IE6 in playonlinux, so you may need an old enough environment to reproduce this problem ). I mentioned that for IE8, we can only inherit the character set (iframe) of the same domain ). But at that time there was a small BUG that could be used to cheat browsers.
The general idea is as follows:
// utf7exploit.html
A file containing the same domain is redirected through the header (Location: somedomain) in the file to bypass the restrictions that iframe of different domains cannot inherit character sets. (>.<
That's a wonderful time ). Unfortunately, this vulnerability was also supplemented. If you want to experience it yourself. You may need to reproduce it in the winxp sp2 + IE7 environment.
0x04 not all cross-origin character set inheritance must use iframe
After learning about the severity of cross-origin character set inheritance. Let's take a look at how the cool stream in Japan is played. This is a vulnerability that MK uses to complete an XSS challenge from ZDResearch. CVE2013-5612)
In versions earlier than Firefox26, if a page without charset settings is sent through POST, in this case, the charset of the sent page is inherited even in different domains, which can be exploited by XSS attacks. In other words, we can send a post request from any page to set any charset for pages without charset.
The following are the XSS challenges for ZDResearch (charset not set): zookeeper memory https://zdresearch.com/challenges/xss1/
This is the POST page constructed by MK:
Http://l0.cm/zdresearch_xss_challenge.html
The specific POC code is as follows:
<meta charset="iso-2022-kr"><form action="https://zdresearch.com/challenges/xss1/" method="post"><input name="XSS" value="
When we send a POST request to https://zdresearch.com/challenges/xss1/through the charset for the iso-2022-kr page, the target page will inherit our charset. Because the character set [ISO-2022-KR]
Start
The string at the end is regarded as 2 bytes, which means that onmouseover is successfully inserted into the target page. Two words... pretty!
0x05 overlord stream (MS13-037)
I wrote iframe and tried POST. What else? Of course! Or an article from the MK blog [force the automatic detection encoding function of Internet Explorer ].
Test environment: Windows Vista sp2 IE9
Reproduction method: Make a specific page to force the automatic detection encoding function
<Script> function go () {window. open ("http://vulnerabledoma.in/r_slow? Url = http: // target/"," x ") // sequence 1 window. open ("http://vulnerabledoma.in/h_back.html", "x") // order 2} </script> <button onlclick = go ()> go </button>
Because these pages exist, if you are interested, you can open them one by one to see what is written in them. What may be hard to understand is that there is a little delay in the transfer of the jump in sequence 1. Can access this page feel the http://vulnerabledoma.in/r_slow? Url = https://www.google.com/This is because if you try to return garbled code (mojibake) before the target page is fully loaded ). However, this problem can be perfectly solved by adding a little delay and then redirecting.
Is it a security issue if the code is garbled? Maybe yes. In this case, you only need to create a character set for a specific output point on the target page, then according to this POC, the target page will change the character set based on your specific output, leading to the vulnerability. (For example, containing [0x1B] $) C may make the Page code A ISO-2022-KR ). In the end, MK also got 500 knives from Google's pocket through this method. It doesn't matter how much money someone else takes, but it should prove the feasibility and value of this vulnerability from the aspect.
0x06 CSP Bypass
Character Set seems to be capable of everything. By bypassing the htmlspecialchars function, hackers are able to brush one CVE after another, and some challenges have been fulfilled to complete some milk powder money. However, character sets are far from doing this. Let's take a look at this character set-based CSP bypass. (Still from MK's blog)
Of course, there are some prerequisites for using this method:
• HTTPResponseHeader no charset set • allow us to implant 0x00 in the target page • our input converts the text prior to the output point into a UTF-16 (BE/LE, that string can be used as a javascript function.
Isn't this scene a little picky? haha! So I made a page like this.
Assume that this is a page with a stored XSS vulnerability and some scripts are implanted.
The following is the specific POC page: http://vulnerabledoma.in/csp_utf16
The CSP Challenge initiated by/fd in the wooyun community is also very similar to this challenge. In the end,/fd also provided its own solution in the post. If you are interested, you can check it out.  Http://zone.wooyun.org/content/10596
Refer 0x07 character "ghost"
Sometimes a character is like a ghost, and we cannot feel it. For example, the earlier version of Firefox ignores 0x80, while the earlier version of IE ignores 0x00. This is undoubtedly a headache, because for the filter, the script is not equals[0x00]cript
. For example, some characters in the position will be ignored in chrome (which can be used now ):
<a href="javascript:alert(1)">asd</a>
Sometimes it will not only exist silently, but will destroy something, for example, the following example:
If we use the wide byte, we can break through the following restrictions:
Zookeeper
But will this problem only occur in GBK? In fact, there are many such character sets. The following is an example of Shift_JIS. The test results after changing the character set in the above Code to shift_JIS are as follows:
The problematic character set is far more than that. At the OWASP international summit held in Tokyo, Masato kinugawa's topic "complete investigation of coding and Security" raised many such problems. Next, let's give a brief introduction to this topic.
(1) Investigation of character set support by various browsers
The author collected nearly 2500 characters of character set encoding names in advance and classified the results by test. These are Character Set names, aliases, and unrecognized characters. For details about the survey method, refer to the following link:
Http://masatokinugawa.l0.cm/2013/03/browser-?support-#encodings-#list.html
Tests show that the browser supports many character sets that are not normally used. The following is an overview of the official character set and alias:
Http://l0.cm/encodings/list/
Then there is the support for character sets by various browsers:
Http://l0.cm/encodings/table/
Due to the large amount of content, only some textures are attached here (for chrome support): for IE support:
As mentioned above MK also thinks that the most ferocious is UTF-7. Because the general filtering method cannot intercept such XSS attacks. Sadly, Microsoft still supports this encoding until IE11. But the good news is that Microsoft is exploring whether it will remove support for UTF-7 in the next IE12.
After investigating the encoding support of various browsers, MK conducted various tests on the encoding.
Take the problematic parts in history as a reference and perform the following three tests on character sets:
{TEST1} the specified byte will eventually become a special character. {TEST2} the specified byte destroys the text that follows it. {TEST3} specific bytes are ignored.
Some Test Results of TEST1:
Zookeeper
Note: The first column is the browser, the second column is the character set, the third column is the test byte, and the fourth column is the rendered character.
Some Test Results of TEST2:
Note: The first column is the browser, the second column is the character set, the third column is destructive bytes, and the fourth column is the number of broken bytes.
Test results of TEST3
Note: The first column is the browser, the second column is the character set, and the third column is the byte that will be ignored.
If you want to view the complete results of these three tests, you can see here: http://l0.cm/encodings/
Finally, let's take a look at what these character set features can do.
(1) bypass the Anti-XSS function of the browser
Chrome's Anti-XSS function bypasses the instance:
The Anti-XSS function of IE bypasses the instance:
(2) self-XSS Based on encoding Switching
Note: The specific operation method is to manually switch the code to shift_jis (For details, refer to the example of shift_jis in the previous section)
Although this is not a vulnerability, it is still a problem. This issue has nothing to do with whether you have set character sets. In addition, it is difficult to cope with such problems. For general users, they simply cannot imagine that they may be attacked because they switch Their encoding. Fortunately, NoScript can detect this problem ^_^.
0x08 Summary
You can see a simple character set, which may lead to unexpected problems. Although this article focuses on the possible security issues of XSS. But we all know that the actual impact is more than that. As a vendor, I think we should also pay attention to this issue and try to set charset on all pages. It is best to use HTTP Response Header when conditions permit, not just meta tags. As a user, the above-mentioned Code-Based Switching XSS issue should also be paid attention. If you are a Firefox user, we recommend that you install NoScript for defense. Do not easily listen to others' guidance for code switching on some pages. If an error occurs in this article, I hope you can point it out. This will allow me to learn more and prevent more people from learning wrong things.
References
Http://gihyo.jp/admin/serial/01/charcode http://blog.kotowicz.net/2010/10/xss-hackme-challenge-solution-part-2.html http://www.slideshare.net/ockeghem/owasp20134021-x https://speakerdeck.com/appsecapac2014/the-complete-investigation-of-encoding-an d-security http://masatokinugawa.l0.cm/2013/12/CVE-2013-5612-encoding-inheritance-xss.htm l http://masatokinugawa.l0.cm/2013/11/MS13-037-encoding-xss.html http://masatokinugawa.l0.cm/2012/12/encoding-self-xss.html http://masatokinugawa.l0.cm/2012/05/utf-16content-security-policy.html