First, Introduction
In order to unify the world's character sets, the Unicode character set is popular, Java also supports Unicode encoding, that is, in Java the code point value is stored in, that is, either ' A ' or ' medium ' occupies two bytes.
Code point value: The code value corresponding to the character typeface in the Unicode encoding table;
Code unit: A char in Java, which can be a basic unit that is considered a character encoding
Second, the code to intercept the string based on byte implementation
public string getsubstring (string str, int length) throws Exception {
int i;
int n;
byte[] bytes = str.getbytes ("Unicode"); Encode a string into a byte sequence using the Unicode character set
i = 2; The first two bytes of bytes are the flag bit, bytes[0] =-2, bytes[1] = 1, so start with the second bit
n = 0;
for (; i < bytes.length && n < length; i++) {
if (i% 2 = = 1) {
n++;
} else {
if (bytes[i]! = 0) {
n++;
}
}
}
Remove half of Chinese characters
if (i% 2 = = 1) {
if (bytes[i-1]! = 0) {
i = i-1;
} else {
i = i + 1;
}
}
return new byte (bytes, 0, I, "Unicode");
}
Intercepting strings in Java based on bytes