First, the character encoding detailed introduction
1. Bytes (byte) is a unit of measurement that is a unit of computer information technology storage capacity
2. Meaning of characters: number of characters text and symbols used in computer text, such as 1,2,3,4,~,@,!,%,^ wait
3. The relationship between characters and bytes in different encodings is different
A. In the ASCLL code, an English letter (case-insensitive) takes up one byte (8bit), and a man occupies two bytes (16bit)
B. UTF-8 encoding, an English character equals one byte, and one Chinese (with traditional) equals three bytes
C. In Unicode encoding, one English two bytes, one Chinese (with traditional) accounting for two bytes
D. The English symbol occupies one byte, the Chinese symbol occupies two bytes
4. Multi-Character Set: National text encoding yo overlapping encoding (encoding conflict) causes garbled
A. First, there is only one character set on the internet------ans's ascll character set, which he uses 7bits to represent a character,
A total of 128 characters, including English letters, numbers, punctuation, and then extended, using 8bits to represent a character
can represent 256 characters, and then add some special characters to the original 7bits base.
B. Subsequently, the voice of the countries to join, ASCLL has been unable to meet the demand, all countries in the ASCLL based on the development of their own
Character sets, these character sets derived from the ANSL standard are customarily referred to as the ANSL character set
official name mbcs (Multi-Byte chactacter system, or multibyte character systems) , each language has its own character set
therefore The unicode Character Set ,
It is fixed using the 16 bits (two bytes, one word) to represent a character , which can represent a total of 65,535 characters, will be the world's
All speech characters commonly used characters are included, (the Unicode standard is called UTF-16), and later in order to enable double-byte Unicode
The ability to transmit correctly on existing processing single-byte systems, UTF-8, and Unicode encoding using MBCS .
UTF-8 is encoded, it belongs to the Unicode character set,
5.Windows defines a number of data types
a.wchar_t is two bytes with a W is this type
B.wchar Unicode characters He's actually wchar_t.
C.pwstr pointer to a Unicode string wchar_t *
D.pcwstr points to a constant Unicode const wchar_t *
E. The corresponding multibyte type is CHAR,LPSTR,LPCSTR
F.asnl/unicode Universal Data Type ,
Tcahr is char in multiple character sets, wchar_t in Unicode
Ptstr in multiple character sets is char *, Unicode is wchar_t *
LPCTSTR a const char * in a multi-character set, and a const wchar_t in Unicode *
F. With a is a multi-character set, W is Unicode (the paragraph character), T is universal
7.Windows multi-character set and Unicode mutual conversion API
A.widechartomultibyte maps A Unicode string to a multibyte string
B.multibytetowidechar mapping a multibyte string to a Unicode string
8. Function using the above function is more complex, you can use the following macro function
The identifier uses_conversion must be declared before use;
A2W: Multi-byte to width byte
uses_conversioncstring str;char* Achar = "ABCDEFG"; wchar_t* wchar = a2w (Achar); str = WChar;
W2A: Convert paragraph bytes to multibyte
uses_conversion;wchar_t* Achar = L "ABCDEFG ah"; char* wchar = W2A (Achar);
T2a:t representative follows system to multibyte
Uses_conversion; char * pchar= "char to CString"; CString ctemp=a2t (PChar);
T2W: System Type Transfer byte
Ses_conversion; CString ctemp =_t ("Char to CString"); char * PCHAR=A2T (PCHAR);
9. Use the Macro function transformation above sparingly
A. If you use this function in a loop, it may cause a stack overflow
because you look at the code and find that his function calls Alloc to request memory, He will allocate it in the stack of functions,
VC compiler default is 2M, in a loop call this function will always allocate memory.
B. The best solution is to use WideCharToMultiByte MultiByteToWideChar
With these two APIs, it's convenient to use the two API packages.
10. Use Thar _text to accommodate both Unicode and multibyte character sets
Windows core programming (3) character encoding detailed