Truncates strings by specified byte length, regardless of Chinese characters, English letters, and numbers.

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

When a web application displays a string in a browser, due to the display length limitation, it is often necessary to intercept the string before display. However, many popular languages, such as C # and Java, use Unicode 16 (ucs2) encoding internally. In this encoding, all the characters are two characters. Therefore, if the string to be intercepted is a mix of Chinese, English, and numbers, the following string is generated:

String S = "A plus B equals C. If a equals 1 and B equals 2, C equals 3 ";

The preceding string contains both Chinese characters and English characters and numbers. If you want to intercept the first six bytes of characters, it should be "A plus B", but if you use the substring method to intercept the first six characters, it will be "A plus B equals C ". This problem occurs because the substring method treats double-byte Chinese characters as one byte character (ucs2 character. To solve this problem, first obtain the ucs2 encoded byte array of the string. The following code is as follows:

Byte [] bytes = system. Text. encoding. Unicode. getbytes (s );

Scanning starts from the first byte. For an English or numeric character, the first byte of ucs2 encoding is the corresponding ASCII, and the second byte is 0, for example, the ucs2 encoding of A is 97 0, and the two Chinese characters are not 0. Therefore, the ucs2 encoding rules can be used to calculate the actual number of bytes, register the string truncation method as an extension method of the string class. The implementation code is as follows:

Public static class stringext {// <summary> // format a string that exceeds the specified length. After the string is intercepted, the string is added... /// </Summary> /// <Param name = "str"> string </param> /// <Param name = "displaylength"> length of bytes displayed </param> // <returns> </returns> Public static string formatstringlength (this string STR, int displaylength) {// The truncated string substr = string. empty; // The default encoded byte length of the string. Int namelenth = encoding. default. getbytecount (STR); // If (namelenth> displaylength) {// subtract "... "length, to obtain the length of the byte to be truncated displaylength = displaylength-3; // The number of bytes currently traversed, calculated by displaylength, // The Chinese character is regarded as two bytes and an English number. It is used to compare it with displaylength. It is better to exit the loop int currentlength = 0; // The length of the byte to be truncated, which is different from displaylength, here is Unicode (USC2) encoding, // do not distinguish Chinese characters or letters, each character occupies two bytes length int sublength = 0; // Unicode (USC2) generated by the string) encoded byte array byte [] strbytes = encoding. unicode. getbytes (STR); // For (; sublength <strbytes. getlength (0) & currentlength <displaylength; sublength ++) {// because Unicode (USC2) encoding does not distinguish Chinese characters from letters, each character occupies two bytes, // sublength is a subscript. If it is 0 or an even number, it is exactly the first byte of the two bytes in ucs2 encoding. // for an English or numeric character, the first byte of ucs2 encoding is the corresponding ASCII, and the second byte is 0, for example, ucs2 encoding of A is 97 0, the remainder of the two Chinese characters is 0, indicating that this is the first byte of each character, and only one of currentlength is added here, after determining the second byte, add 1 If (sublength % 2 = 0) {currentlength ++;} else // The remainder except 2 is not 0, indicates the second byte of a character. Check the second byte of the character {// Add 1 to the Chinese character to conform to the default encoding for two bytes if (strbytes [sublength]> 0) {currentlength ++ ;}}// if sublength is an odd number, it is the last character to be intercepted. Only one of the two bytes is intercepted, that is, the average, you need to process it into an even number if (sublength % 2 = 1) {// judge the second byte of the character (use its own subscript, because the subscript starts from 0, the actual check is the next byte) // when the ucs2 character is a Chinese character, the second byte occupies 1 byte in the default encoding. If it is completed, the length exceeds the limit, so remove the half-cut Chinese character if (strbytes [sublength]> 0) {sublength = sublength-1;} else // The ucs2 character is a letter or number, the second byte does not exist in the default encoding and does not occupy space. Complete the character {sublength = sublength + 1 ;}} substr = encoding. unicode. getstring (strbytes, 0, sublength) + "... ";}else // The length is not exceeded, not formatted {substr = STR ;}return substr ;}}

In the above Code, if an odd number of characters (in bytes) are intercepted at the end, and the last character is a letter or number, the character is retained, if this Chinese character is half cut, the Chinese character is removed.

You can use the following code to intercept a string:

String substr = S. formatstringlength (6); // The value of substr is "A plus B"

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Truncates strings by specified byte length, regardless of Chinese characters, English letters, and numbers.

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Truncates strings by specified byte length, regardless of Chinese characters, English letters, and numbers.

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support