I saw a problem just now,
"How to verify whether a string is a unicode string ".
I started to look at this question. I have finished reading it. How can I do this. In my impression, Java's character data types are represented in UTF-16 encoding.
That is, it is a unicode string. You do not need to judge it.
So I don't think this question might mean that ..... after thinking about it, this question should be asked whether the original character set used by a string is Unicode (Unicode also has many encoding rules, such as UTF-8, UTF-16 and so on ). There are also many character sets, such as common ASCII and gb2312.
In this way, I can understand this problem ..... But the new problem comes again, UTF-8 is compatible with ASCII, then how do we distinguish them two?
Here I feel I can only traverse the string, get the binary, and then judge, if any character exceeds a byte, It is UTF-8 encoding...
If you have a better method, please feel free to provide it.