Usually, a Chinese will occupy two bytes of space. In many languages, a Chinese character is 2 characters long.
In C #, however, the Chinese characters contained in string are only one character long. This leads to a lot of times when you use String. SubString (int startindex,int length) method to intercept characters will be misplaced.
Recently, for work reasons, the characters with the specified length are truncated. But in the document, 1 Chinese is 2 lengths to calculate. At first, I just thought the documentation was wrong, and then I realized it was due to C # differences.
At first, it is directly to the online search algorithm, but found the algorithm, the basic idea is to judge each character (according to the ASCII value). Not the algorithm is complex, is error-prone, and can not cover all the Chinese.
A little study of the next, or use. NET Framework comes with the simplest approach.
The basic idea is to convert a string to byte[] and then to a string, which is done in two steps, and is foolproof.
public static string SubString (string tosub,int startindex,int length) { byte[] subbyte= System.Text.Encoding.Default.GetBytes (tosub); String sub=system.text.encoding.default.getstring (subbyte,startindex,length); return Sub;}
Slightly explained under parameters, Tosub-the string that needs to be intercepted, StartIndex-the position index to start intercepting, length-intercept
Usage Basic and string. SubString (int startindex,int length).
"Reprint" C # string intercepts a specified length of Chinese characters--Fine point