How to reverse the encoding of strings and Related Characters in Java

Last Update:2018-12-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Copy codeThe Code is as follows: public String reverse (char [] value ){
For (int I = (value. length-1)> 1; I> = 0; I --){
Char temp = value [I];
Value [I] = value [value. length-1-I];
Value [value. length-1-I] = temp;
}
Return new String (value );
}

Such code has no problems with algorithms. However, when viewing the StringBuffer source code today, we found that the source code of the reverse method is very subtle. The source code is as follows:

Copy codeThe Code is as follows: public AbstractStringBuilder reverse (){
Boolean hasSurrogate = false;
Int n = count-1;
For (int j = (n-1)> 1; j> = 0; -- j ){
Char temp = value [j];
Char temp2 = value [n-j];
If (! HasSurrogate ){
HasSurrogate = (temp> = Character. MIN_SURROGATE & temp <= Character. MAX_SURROGATE)
| (Temp2> = Character. MIN_SURROGATE & temp2 <= Character. MAX_SURROGATE );
}
Value [j] = temp2;
Value [n-j] = temp;
}
If (hasSurrogate ){
// Reverse back all valid surrogate pairs
For (int I = 0; I <count-1; I ++ ){
Char c2 = value [I];
If (Character. isLowSurrogate (c2 )){
Char c1 = value [I + 1];
If (Character. isHighSurrogate (c1 )){
Value [I ++] = c1;
Value [I] = c2;
}
}
}
}
Return this;
}

This method is defined in the parent class AbstractStringBuilder of StringBuffer, so the return value of this method is AbstractStringBuilder. The method called in the subclass is as follows:Copy codeThe Code is as follows: public synchronized StringBuffer reverse (){
Super. reverse ();
Return this;
}

From the content of the method, the basic idea in the source code is the same. It also traverses half of the string and exchanges each character with its corresponding character. However, the difference is that you must determine whether each Character is between Character. MIN_SURROGATE (\ ud800) and Character. MAX_SURROGATE (\ udfff. If this is found in the entire string, traverse the string from the beginning to the end again and determine whether value [I] meets Character. isLowSurrogate (). If yes, continue to judge whether value [I + 1] meets Character. isHighSurrogate (). If this condition is also met, the characters between the I-bit and the I + 1-bit are exchanged. Some may wonder why it is necessary to do so, because the characters in Java already use Unicode code, and each character can be placed with a Chinese character. Why?
A complete Unicode character is called CodePoint, while a Java char is called code unit. The String object stores Unicode characters in a UTF-16 and NEEDS 2 characters to represent the Chinese character of an oversized character set. This representation is called Surrogate. The first character is Surrogate High, and the second is Surrogate Low. Note the following:
Determine whether a char is a Character in the Surrogate area. Use the Character's isHighSurrogate ()/isLowSurrogate () method to determine whether it is a Character in the Surrogate area. Returns a complete Unicode CodePoint from two Surrogate High/Low characters using the Character. toCodePoint ()/codePointAt () method.
A Code Point may require one or two char representation, so CharSequence cannot be used directly. the length () method returns the number of Chinese characters in a String. codePointCount ()/Character. codePointCount ().
To locate the nth Character in a String, N cannot be used as the offset directly. Instead, you must traverse the String header in sequence and use the String/Character. offsetByCodePoints () method.
Find the previous character from the current character of the String, and you cannot directly use offset -- to implement it. Instead, you must use String. codePointBefore ()/Character. codePointBefore (), or use String/Character. offsetByCodePoints ()
Find the next Character from the current Character, which cannot be directly implemented using offset ++. You need to determine the length of the current CodePoint before calculation, or use String/Character. offsetByCodePoints ().

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How to reverse the encoding of strings and Related Characters in Java

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

How to reverse the encoding of strings and Related Characters in Java

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support