change to :https://www.cnblogs.com/runwulingsheng/p/5106078.html
bo Master : You are the sky suddenly across the Lightning
code Point: Refers to the code value (number) of a character in an encoded table (such as Unicode), preceded by a u+, such as a code point for the letter a u+0041
code unit : refers to the smallest storage unit that represents an encoded table character, represented by a 16-bit
| Unicode code Point |
u+0041 |
U+00df |
u+6771 |
u+10400 |
| Represents a glyph |
| utf-32 code unit |
|
|
|
|
| utf-16 code unit |
|
|
|
|
| UTF-8 code Unit |
|
|
|
|
Supplementary characters, characters represented in two code units
NET pick:
code Point): Refers to the number assigned to a character in Unicode, a character that occupies only one code point; For example: we speak of the character "Han", its code point is u+6c49;
code Unit): For the encoding method, it refers to the encoding method of a character encoding after the smallest storage unit; For example: UTF-8, the code unit is a byte, because a character can be encoded as 1, 2 or 3 4 bytes; in UTF-16, The code unit becomes two bytes (that is, a char), because a character can be encoded as 1 or 2 char (you cannot find a UTF-16 encoded character that is smaller than a char, hehe). #一个字符, just one code point, but there may be multiple code units (that might be encoded as 2 char) #以上概念绝非学术化的绕口令, which means that when you want to specify what characters you use in a uniform way, use code points (that is, the program you tell your It's always better to use the first character in Unicode than the code unit (because you have to differentiate between cases, sometimes providing a 16-digit number, sometimes providing two). "
Example:To find the first code point, use the following statement int index = greeting.offsetbycodepoints (0,i); Get 0 start cheap I code point index value (number) int CP = GREETING.CODEPOINTAT (index); Returns a char value (Unicode corresponding number)
Python: code unit, Code point Introduction