An online Java common escape and unescape method bug

Source: Internet
Author: User

Escape encoding and Unescape encoding converts a character to 16 Unicode encoding, preceded by a% character to be identified.

No more explanations here, see here: http://www.jb51.net/article/23657.htm.

Originally a method of JS, was later turned into a Java method. Specific reference here http://blog.sina.com.cn/s/blog_4bb52a160100d9tm.html, is the programmer copy and paste the most common code.

First look at escape source code:

/**
* Implementation of the JS foreground escape () function
*
* @param src
* @return
*/
public static string Escape (String src) {
int i;
Char J;
StringBuffer tmp = new StringBuffer ();
Tmp.ensurecapacity (Src.length () * 6);
for (i = 0; i < src.length (); i++) {
j = Src.charat (i);--Converts a character to an int value
if (Character.isdigit (j) | | | Character.islowercase (j) | | Character.isuppercase (j))
Tmp.append (j);--1. If it is a number or a letter, use it directly
else if (J < 256) {
Tmp.append ("%");--2. If in [16-255], then add% prefix
if (J < 16)
Tmp.append ("0");--3. If the character encoding is <16, precede with the%0 prefix (0 to encode 2 character widths)
Tmp.append (Integer.tostring (J, 16));
} else {
Tmp.append ("%u");
Tmp.append (Integer.tostring (J, 16));--4. All other encodings are prefixed with%u
        }
}
return tmp.tostring ();
}

Look again at the UNESCAP method:

public static string unescape (String src) {
StringBuffer tmp = new StringBuffer ();
Tmp.ensurecapacity (Src.length ());
int lastpos = 0, pos = 0;
Char ch;
while (Lastpos < Src.length ()) {
pos = src.indexof ("%", lastpos); --Check% number
if (pos = = Lastpos) {
if (Src.charat (pos + 1) = = ' U ') {
ch = (char) integer.parseint(src.substring (pos + 2, pos + 6), 16); //5 --When%u is encountered, the following 4 width characters are read for decoding
                Tmp.append (CH);
Lastpos = pos + 6;
} else {
ch = (char) integer.parseint (src.substring (pos + 1, pos + 3), +//6--other%, reads 2 width [0-255] of 16 progress code, decodes
Tmp.append (CH);
Lastpos = pos + 3;
}
} else {
if (pos = =-1) {
Tmp.append (src.substring (Lastpos));
Lastpos = Src.length ();
} else {
Tmp.append (Src.substring (Lastpos, POS));
Lastpos = pos;
}
}
}
return tmp.tostring ();
}

The code logic is simple, parsing 2 width [0-255] and 4 width [4096-65535] characters, respectively.

But there are 2 questions: 3 width [256-4095] The character designators does not exist? Does the width of more than 4 characters exist? If present, this code has a serious bug that can cause parsing to fail.

Let's start with the first question:

The East Asian language and most languages Unicode encoding translates to 4 widths after converting to 16, but does not imply that 3-width characters do not exist. For example, Baidu Encyclopedia of the Indian language of Yoga:???, 3 characters, converted 16 After the 3 width of the system. %u92f%u94b%u917, the above code will unescape fail for this type of character.

The workaround guarantees the generated >255 character encoding, which has 4 widths.

The code in red Note 4 is modified to:

if (j<4096) {    tmp.append (016));--4. All other encodings are prefixed with%u

Or

Tmp.append (String.Format ("%04x", J))

Second question:

A hexadecimal 4 width represents 2 bytes. The current Unicode specification is ucs-2, which means that all characters are stored in double-byte. So the code can be done. If you later upgrade to Ucs-4, or even ucs-8, this code is definitely a problem. However, it should be a matter of n years. Ucs-2 is sufficient to meet most of the current scenarios.

An online Java common escape and unescape method bug

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.