One, character encoding
Character
1) The character set is a numeric code set of characters. There are Ansi/ascii, MBCS (multibytes), Unicode and so on. For example, the "Han" character Unicode code is 0x6c49.
Coding scheme
2) The encoding scheme is how the character code is recorded. There are UTF-8, UTF-16, GB2312 and so on. The coding scheme is divided into two kinds: "Variable length coding" and "fixed length coding". The UTF-8 is a variable-length encoding (some three bytes have two bytes in the kanji), and the UTF-16 is a two-byte fixed-length encoding.
Character Set and encoding scheme
3) The character set and encoding scheme are matched. such as GB2312 encoding, which is the GB2312 character set and GB2312 encoding scheme. Here the GB2312 is a two-byte fixed-length encoding. The Unicode encoding referred to refers to the Unicode character set and the Utf-x encoding scheme. Where UTF-16 is a two-byte fixed length encoding, UTF-8 is designed to be variable length for applications that are compatible with existing ANSI/ASCII codes and are widely used in Internet services.
Multibytes and Unicode
1) under VC, or Win32, the difference between the two is equivalent to variable length and fixed length of the code, or the use of non-UTF-16 or UTF-16.
2) since the Winnt kernel, the win OS has been fully updated to UTF-16 encoding.
3) Here Unicode refers only to the Unicode character set with UTF-16 encoding. The rest, UTF-8, UTF-7, GB2312, ANSI/ASCII, etc. are classified as multibytes. Therefore multibytes should be understood as "variable-length" characters, not "many" characters.
vs Engineering Applications
The project property sets the character set to Multibytes or Unicode. This is used to toggle the WINAPI version, which is in ANSI or Unicode version.
Two, ANSI characters and Unicode characters and string data types
1) in C, the char type represents a 8-bit ANSI character.
indicated as follows:
char c = ' a ';//A space that occupies one byte in memory
2) wchar_t represents a 16-bit Unicode (UTF-16) character.
wchar_t c = L ' A ';//occupies two bytes of space in memory