An ANSI string occupies one byte in English and contains two Chinese characters. It ends with one/0 and is often used in TXT text files.
Unicode string. Each character (Chinese character or English letter) occupies 2 bytes and ends with 2 consecutive/0 characters. This string is used by the NT operating system kernel, it is often defined as typedef unsigned short wchar_t; so we often see errors such as char * cannot be converted to unsigned short *, which is actually Unicode.
Utf8 is a form of Unicode compression. In Unicode, English A is expressed as 0x0041, which wastes 50% of space. Therefore, English is compressed into 1 byte and is UTF-8 encoded, however, the Chinese character occupies three bytes in utf8, which is obviously not as cost-effective as the Chinese character. Therefore, Chinese Web pages are used for ANSI encoding, while foreign web pages are commonly used for utf8 encoding.
First, we will introduce two functions:
Multibytetowidechar
Function: maps a string to a unicode string. The string mapped by this function does not need to be a multi-byte character group.
Function prototype: int multibytetowidechar (uint codePage, DWORD dwflags, lpcstr lpmultibytestr, int cchmultibyte, lpwstr lpwidecharstr, int cchwidechar );
Widechartomultibyte
Function: maps a unicode string to a multi-byte string.
Function prototype: int widechartomultibyte (uintCodePage, DWORDDwflags, LpwstrLpwidecharstr, IntCchwidechar, LpcstrLpmultibytestr,
IntCchmultibyte, LpcstrLpdefaultchar, PboolPfuseddefaultchar);
Unicode in VC conversion to UTF-8 is as follows:
Wchar_t * wcs0 = l "Hello lilei! ";
Int wcslen0 =: widechartomultibyte (cp_utf8, null, wcs0, wcslen (wcs0), null,
0, null, null );
Char * sz0 = new char [wcslen0 + 1];
: Widechartomultibyte (cp_utf8, null, wcs0, wcslen (wcs0), sz0, wcslen0, null, null );
Sz0 [wcslen0] = '/0 ';
Tip: http://blog.csdn.net/kesur/article/details/5432473