Windows core programming (3) character encoding detailed

Source: Internet
Author: User

First, the character encoding detailed introduction


1. Bytes (byte) is a unit of measurement that is a unit of computer information technology storage capacity


2. Meaning of characters: number of characters text and symbols used in computer text, such as 1,2,3,4,~,@,!,%,^ wait


3. The relationship between characters and bytes in different encodings is different

A. In the ASCLL code, an English letter (case-insensitive) takes up one byte (8bit), and a man occupies two bytes (16bit)

B. UTF-8 encoding, an English character equals one byte, and one Chinese (with traditional) equals three bytes

C. In Unicode encoding, one English two bytes, one Chinese (with traditional) accounting for two bytes

D. The English symbol occupies one byte, the Chinese symbol occupies two bytes



4. Multi-Character Set: National text encoding yo overlapping encoding (encoding conflict) causes garbled

A. First, there is only one character set on the internet------ans's ascll character set, which he uses 7bits to represent a character,

A total of 128 characters, including English letters, numbers, punctuation, and then extended, using 8bits to represent a character

can represent 256 characters, and then add some special characters to the original 7bits base.

B. Subsequently, the voice of the countries to join, ASCLL has been unable to meet the demand, all countries in the ASCLL based on the development of their own

Character sets, these character sets derived from the ANSL standard are customarily referred to as the ANSL character set

official name mbcs (Multi-Byte chactacter system, or multibyte character systems) , each language has its own character set

therefore The unicode Character Set ,

It is fixed using the 16 bits (two bytes, one word) to represent a character , which can represent a total of 65,535 characters, will be the world's

All speech characters commonly used characters are included, (the Unicode standard is called UTF-16), and later in order to enable double-byte Unicode

The ability to transmit correctly on existing processing single-byte systems, UTF-8, and Unicode encoding using MBCS .

UTF-8 is encoded, it belongs to the Unicode character set,


5.Windows defines a number of data types

a.wchar_t is two bytes with a W is this type

B.wchar Unicode characters He's actually wchar_t.

C.pwstr pointer to a Unicode string wchar_t *

D.pcwstr points to a constant Unicode const wchar_t *

E. The corresponding multibyte type is CHAR,LPSTR,LPCSTR

F.asnl/unicode Universal Data Type ,

Tcahr is char in multiple character sets, wchar_t in Unicode

Ptstr in multiple character sets is char *, Unicode is wchar_t *

LPCTSTR a const char * in a multi-character set, and a const wchar_t in Unicode *

F. With a is a multi-character set, W is Unicode (the paragraph character), T is universal



7.Windows multi-character set and Unicode mutual conversion API

A.widechartomultibyte maps A Unicode string to a multibyte string

B.multibytetowidechar mapping a multibyte string to a Unicode string


8. Function using the above function is more complex, you can use the following macro function

The identifier uses_conversion must be declared before use;

A2W: Multi-byte to width byte

uses_conversioncstring str;char* Achar = "ABCDEFG"; wchar_t* wchar = a2w (Achar); str = WChar;

W2A: Convert paragraph bytes to multibyte

uses_conversion;wchar_t* Achar = L "ABCDEFG ah"; char* wchar = W2A (Achar);

T2a:t representative follows system to multibyte

Uses_conversion;  char * pchar= "char to CString"; CString ctemp=a2t (PChar);

T2W: System Type Transfer byte

Ses_conversion;  CString ctemp =_t ("Char to CString"); char * PCHAR=A2T (PCHAR);


9. Use the Macro function transformation above sparingly

A. If you use this function in a loop, it may cause a stack overflow

because you look at the code and find that his function calls Alloc to request memory, He will allocate it in the stack of functions,

VC compiler default is 2M, in a loop call this function will always allocate memory.


B. The best solution is to use WideCharToMultiByte MultiByteToWideChar

With these two APIs, it's convenient to use the two API packages.



10. Use Thar _text to accommodate both Unicode and multibyte character sets



Windows core programming (3) character encoding detailed

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.