Today, I occasionally handle the problem of uncommon words.
We know that in C ++, if it is such a string a = "China ";
The length is 4, that is, each Chinese Character occupies two bytes.
But in C #, the length of such a statement is changed to 2. Someone told me that it is because the storage encoding method is different and I have not understood it yet.
In this way
OriginalProgramDetermines whether it is a uncommon word.
A problem occurs.
Because the processing process is to put it into char [] (for example, "China" is the four elements), and then convert the first byte of a Chinese character into a high byte for judgment.
Take the lower byte of a Chinese character for determination
But in C #, hit char []. For example, "China" is two elements, and there is no way to determine whether it is a rare word.
Maybe I should change my mind to solve this problem. Let me think about it again, and I hope someone else can give me some advice.
Add: The function is to convert the encoding of a Chinese character from GBK to Unicode.
But I don't know whether this method can be used in BS development, because it also requires me to write this method using the CS idea.