Windows core programming (character encoding)

Source: Internet
Author: User

One, character encoding

Character

1) The character set is a numeric code set of characters. There are Ansi/ascii, MBCS (multibytes), Unicode and so on. For example, the "Han" character Unicode code is 0x6c49.

Coding scheme

2) The encoding scheme is how the character code is recorded. There are UTF-8, UTF-16, GB2312 and so on. The coding scheme is divided into two kinds: "Variable length coding" and "fixed length coding". The UTF-8 is a variable-length encoding (some three bytes have two bytes in the kanji), and the UTF-16 is a two-byte fixed-length encoding.

Character Set and encoding scheme

3) The character set and encoding scheme are matched. such as GB2312 encoding, which is the GB2312 character set and GB2312 encoding scheme. Here the GB2312 is a two-byte fixed-length encoding. The Unicode encoding referred to refers to the Unicode character set and the Utf-x encoding scheme. Where UTF-16 is a two-byte fixed length encoding, UTF-8 is designed to be variable length for applications that are compatible with existing ANSI/ASCII codes and are widely used in Internet services.

Multibytes and Unicode

1) under VC, or Win32, the difference between the two is equivalent to variable length and fixed length of the code, or the use of non-UTF-16 or UTF-16.

2) since the Winnt kernel, the win OS has been fully updated to UTF-16 encoding.

3) Here Unicode refers only to the Unicode character set with UTF-16 encoding. The rest, UTF-8, UTF-7, GB2312, ANSI/ASCII, etc. are classified as multibytes. Therefore multibytes should be understood as "variable-length" characters, not "many" characters.

vs Engineering Applications

The project property sets the character set to Multibytes or Unicode. This is used to toggle the WINAPI version, which is in ANSI or Unicode version.

Two, ANSI characters and Unicode characters and string data types

1) in C, the char type represents a 8-bit ANSI character.

indicated as follows:

char c = ' a ';//A space that occupies one byte in memory

2) wchar_t represents a 16-bit Unicode (UTF-16) character.

wchar_t c = L ' A ';//occupies two bytes of space in memory

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.