The following discussions are based on the definition of C ++.
In the standard definition of C ++
Type wchar_t is a distinct type whose values can represent distinct
Codes for all members ofLargest extended character set specified
Among the supported locales. Type wchar_t shall have the same
Size, signedness, and alignment requirements as one of the other
Integral types, called itsUnderlying type.
...
A character literal that begins with the letter L, such as l 'x', is
Wide-character literal. A wide-character literal has type wchar_t.
The value of a wide-character literal containing a single c-Char has
ValueEqual to the numerical valueOf the encoding of the C-char in
The execution wide-character set. The value of a wide-character-
Eral containing multiple C-chars is implementation-defined.
According to my understanding, this means:
1. the compiler must ensure that wchar_t has the sameSize,Symbol,Alignment requirements. This "Integer type" is defined by the compiler. this means that the result of wchar_t (0x8000> 1) is dependent on the compiler. in VC, if the compilation option "/ZC: wchar_t-" is set, wchar_t is defined as unsigned "unsigned short" by default ". in addition, "signed wchar_t" or "unsigned wchar_t" does not exist in C ++.
2. different from C, wchar_t must be a built-in (build-in) type. I guess this is required for heavy load and template specialization. Let's take a look at the iostream overload type and I will understand it. another topic can be introduced here: Char is neither "signed Char" nor "unsigned char"
3. the standard text of C ++ does not associate wchar_t with Unicode. The standard only requires that wchar_t be able to uniquely encode any character in the complete set of all locale characters of the compiler, although wchar_t is also a unique-code, a maverick compiler has the right to define a set of hexie-code that is completely different from Unicode. However, this hexie-code must be at least numeric and compatible with the number range of tables of the char type. It usually means that the 0-255 value of the hexie-code must be the same as the 0-255 value of the char.
4. As we all know, Windows wchar_t is 16 bits, and Linux wchar_t is 32 bits
5. as of vc8, vc c Runtime Library does not support UTF-8, that is to say setlocale (lc_ctype, "zh_CN.UTF-8") is invalid, setlocale (lc_ctype, "zh_cn.65001") also does not work. after one-step tracking, It is found in getqloc. c, there are the following code
// Verify codePage Validity
If (! Icodepage | icodepage = cp_utf7 | icodepage = cp_utf8 |
! Isvalidcodepage (Word) icodepage ))
Return false;
This code is newly added to vc8, which is not found in vc7. The difference is that vc8 fails when setlocale is used, and vc7 fails only when mbstowcs functions are used.
Http://hi.baidu.com/bbcallen/blog/item/e2e37b1b5add59d3ac6e7549.html