Truncates strings by specified byte length, regardless of Chinese characters, English letters, and numbers.

Source: Internet
Author: User

When a web application displays a string in a browser, due to the display length limitation, it is often necessary to intercept the string before display. However, many popular languages, such as C # and Java, use Unicode 16 (ucs2) encoding internally. In this encoding, all the characters are two characters. Therefore, if the string to be intercepted is a mix of Chinese, English, and numbers, the following string is generated:

String S = "A plus B equals C. If a equals 1 and B equals 2, C equals 3 ";

The preceding string contains both Chinese characters and English characters and numbers. If you want to intercept the first six bytes of characters, it should be "A plus B", but if you use the substring method to intercept the first six characters, it will be "A plus B equals C ". This problem occurs because the substring method treats double-byte Chinese characters as one byte character (ucs2 character. To solve this problem, first obtain the ucs2 encoded byte array of the string. The following code is as follows:

Byte [] bytes = system. Text. encoding. Unicode. getbytes (s );

Scanning starts from the first byte. For an English or numeric character, the first byte of ucs2 encoding is the corresponding ASCII, and the second byte is 0, for example, the ucs2 encoding of A is 97 0, and the two Chinese characters are not 0. Therefore, the ucs2 encoding rules can be used to calculate the actual number of bytes, register the string truncation method as an extension method of the string class. The implementation code is as follows:

Public static class stringext {// <summary> // format a string that exceeds the specified length. After the string is intercepted, the string is added... /// </Summary> /// <Param name = "str"> string </param> /// <Param name = "displaylength"> length of bytes displayed </param> // <returns> </returns> Public static string formatstringlength (this string STR, int displaylength) {// The truncated string substr = string. empty; // The default encoded byte length of the string. Int namelenth = encoding. default. getbytecount (STR); // If (namelenth> displaylength) {// subtract "... "length, to obtain the length of the byte to be truncated displaylength = displaylength-3; // The number of bytes currently traversed, calculated by displaylength, // The Chinese character is regarded as two bytes and an English number. It is used to compare it with displaylength. It is better to exit the loop int currentlength = 0; // The length of the byte to be truncated, which is different from displaylength, here is Unicode (USC2) encoding, // do not distinguish Chinese characters or letters, each character occupies two bytes length int sublength = 0; // Unicode (USC2) generated by the string) encoded byte array byte [] strbytes = encoding. unicode. getbytes (STR); // For (; sublength <strbytes. getlength (0) & currentlength <displaylength; sublength ++) {// because Unicode (USC2) encoding does not distinguish Chinese characters from letters, each character occupies two bytes, // sublength is a subscript. If it is 0 or an even number, it is exactly the first byte of the two bytes in ucs2 encoding. // for an English or numeric character, the first byte of ucs2 encoding is the corresponding ASCII, and the second byte is 0, for example, ucs2 encoding of A is 97 0, the remainder of the two Chinese characters is 0, indicating that this is the first byte of each character, and only one of currentlength is added here, after determining the second byte, add 1 If (sublength % 2 = 0) {currentlength ++;} else // The remainder except 2 is not 0, indicates the second byte of a character. Check the second byte of the character {// Add 1 to the Chinese character to conform to the default encoding for two bytes if (strbytes [sublength]> 0) {currentlength ++ ;}}// if sublength is an odd number, it is the last character to be intercepted. Only one of the two bytes is intercepted, that is, the average, you need to process it into an even number if (sublength % 2 = 1) {// judge the second byte of the character (use its own subscript, because the subscript starts from 0, the actual check is the next byte) // when the ucs2 character is a Chinese character, the second byte occupies 1 byte in the default encoding. If it is completed, the length exceeds the limit, so remove the half-cut Chinese character if (strbytes [sublength]> 0) {sublength = sublength-1;} else // The ucs2 character is a letter or number, the second byte does not exist in the default encoding and does not occupy space. Complete the character {sublength = sublength + 1 ;}} substr = encoding. unicode. getstring (strbytes, 0, sublength) + "... ";}else // The length is not exceeded, not formatted {substr = STR ;}return substr ;}}

 

In the above Code, if an odd number of characters (in bytes) are intercepted at the end, and the last character is a letter or number, the character is retained, if this Chinese character is half cut, the Chinese character is removed.

You can use the following code to intercept a string:

String substr = S. formatstringlength (6); // The value of substr is "A plus B"

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.