This article by larrylgq prepared, reproduced please note the Source: http://blog.csdn.net/larrylgq/article/details/7450256
Author: Lu guiqiang
Email: larry.lv.word@gmail.com
Upper: http://blog.csdn.net/larrylgq/article/details/7444999
The general character set of UCS (Universal Character Set) is independent of the character set independent character set.
It means that the language converts the text into an internal character set before processing the text.
The advantages of using UCS are:
Easy to handle
Disadvantages:
Only the text included in the UCS can be processed.
Increase in text conversion times
Java uses the UCs, the character set is Unicode, And the encoding method is UTF-16
Parts in Unicode that can be expressed in 16 bits are BMP (Basic multilingual panel)
Because the char type of Java is 16 bits, so:
1. The characters beyond the BMP range must be represented by two char characters.
2. The length method of string returns the number of char characters. Therefore, to calculate the number of characters, use the provided codepoint method.
Eg:
String text = "a string of Chinese characters ";
Text. codepointat (0 );