There are two organizations that develop Unicode encoding standards, one is ISO, one is a unified Code alliance consisting of multiple language software manufacturers.
The universal Character Set UCS (Universal Character set) is a coding scheme developed by ISO, UCS-2 encoded with 2 bytes and UCS-4 encoded in 4 bytes.
The Unicode Conversion format UTF (Unicode Transformation format) is an encoding scheme that is implemented on the computer according to the Unicode character set according to a certain conversion rule.
UTF-8 is a variable-length character encoding, and the portion corresponding to the ASCII code (the character between 0x00~0x7f) is still 1 bytes representing 1 characters, and the rules are consistent.
Most of the characters in the UTF-16 are stored in 2 bytes. In the absence of auxiliary plane characters, UTF-16 and UCS-2 refer to the same meaning. However, when auxiliary plane characters are introduced, they are called UTF-16.
Note: UTF-8, UTF-16, and so on are character encodings, though they are not Unicode encoded, although they are related to Unicode.
Note: the "encoding" option in the "Save as" pop-up box on Notepad on the Windows platform is interpreted as follows:
- ANSI is the default encoding method. In the English Windows operating system, ANSI encoding stands for ASCII encoding; in the simplified Chinese Windows operating system, ANSI encoding represents GBK encoding; in traditional Chinese Windows operating system, ANSI encoding represents BIG5 encoding In the Japanese Windows operating system, ANSI encoding represents SHIFT_JIS encoding.
- Unicode refers to UCS-2 encoding, which uses a small-end mode.
- The Unicode big endian is also UCS-2 encoded, using the endian mode.
- UTF-8, self-examination, not explained.
Understanding Unicode Encoding