An online Java common escape and unescape method bug

Last Update:2015-08-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Escape encoding and Unescape encoding converts a character to 16 Unicode encoding, preceded by a% character to be identified.

No more explanations here, see here: http://www.jb51.net/article/23657.htm.

Originally a method of JS, was later turned into a Java method. Specific reference here http://blog.sina.com.cn/s/blog_4bb52a160100d9tm.html, is the programmer copy and paste the most common code.

First look at escape source code:

/**
* Implementation of the JS foreground escape () function
*
* @param src
* @return
*/
public static string Escape (String src) {
int i;
Char J;
StringBuffer tmp = new StringBuffer ();
Tmp.ensurecapacity (Src.length () * 6);
for (i = 0; i < src.length (); i++) {
j = Src.charat (i);--Converts a character to an int value
if (Character.isdigit (j) | | | Character.islowercase (j) | | Character.isuppercase (j))
Tmp.append (j);--1. If it is a number or a letter, use it directly
else if (J < 256) {
Tmp.append ("%");--2. If in [16-255], then add% prefix
if (J < 16)
Tmp.append ("0");--3. If the character encoding is <16, precede with the%0 prefix (0 to encode 2 character widths)
Tmp.append (Integer.tostring (J, 16));
} else {
Tmp.append ("%u");
Tmp.append (Integer.tostring (J, 16));--4. All other encodings are prefixed with%u

        }
    }
    return tmp.tostring ();
}

Look again at the UNESCAP method:

public static string unescape (String src) {
    StringBuffer tmp = new StringBuffer ();
    Tmp.ensurecapacity (Src.length ());
    int lastpos = 0, pos = 0;
    Char ch;
    while (Lastpos < Src.length ()) {
        pos = src.indexof ("%", lastpos); --Check% number
        if (pos = = Lastpos) {
            if (Src.charat (pos + 1) = = ' U ') {
                ch = (char) integer.parseint(src.substring (pos + 2, pos + 6), 16); //5 --When%u is encountered, the following 4 width characters are read for decoding

                Tmp.append (CH);
                Lastpos = pos + 6;
            } else {
                ch = (char) integer.parseint (src.substring (pos + 1, pos + 3), +//6--other%, reads 2 width [0-255] of 16 progress code, decodes
                Tmp.append (CH);
                Lastpos = pos + 3;
            }
        } else {
            if (pos = =-1) {
                Tmp.append (src.substring (Lastpos));
                Lastpos = Src.length ();
            } else {
                Tmp.append (Src.substring (Lastpos, POS));
                Lastpos = pos;
            }
        }
    }
    return tmp.tostring ();
}

The code logic is simple, parsing 2 width [0-255] and 4 width [4096-65535] characters, respectively.

But there are 2 questions: 3 width [256-4095] The character designators does not exist? Does the width of more than 4 characters exist? If present, this code has a serious bug that can cause parsing to fail.

Let's start with the first question:

The East Asian language and most languages Unicode encoding translates to 4 widths after converting to 16, but does not imply that 3-width characters do not exist. For example, Baidu Encyclopedia of the Indian language of Yoga:???, 3 characters, converted 16 After the 3 width of the system. %u92f%u94b%u917, the above code will unescape fail for this type of character.

The workaround guarantees the generated >255 character encoding, which has 4 widths.

The code in red Note 4 is modified to:

if (j<4096) {    tmp.append (016));--4. All other encodings are prefixed with%u

Tmp.append (String.Format ("%04x", J))

Second question:

A hexadecimal 4 width represents 2 bytes. The current Unicode specification is ucs-2, which means that all characters are stored in double-byte. So the code can be done. If you later upgrade to Ucs-4, or even ucs-8, this code is definitely a problem. However, it should be a matter of n years. Ucs-2 is sufficient to meet most of the current scenarios.

An online Java common escape and unescape method bug

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

An online Java common escape and unescape method bug

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

An online Java common escape and unescape method bug

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support