Wchar_t in C ++)

Source: Internet
Author: User

The following discussions are based on the definition of C ++.

In the standard definition of C ++
Type wchar_t is a distinct type whose values can represent distinct
Codes for all members ofLargest extended character set specified
Among the supported locales
. Type wchar_t shall have the same
Size, signedness, and alignment requirements as one of the other
Integral types, called itsUnderlying type.

...

A character literal that begins with the letter L, such as l 'x', is
Wide-character literal. A wide-character literal has type wchar_t.
The value of a wide-character literal containing a single c-Char has
ValueEqual to the numerical valueOf the encoding of the C-char in
The execution wide-character set. The value of a wide-character-
Eral containing multiple C-chars is implementation-defined.

According to my understanding, this means:
1. the compiler must ensure that wchar_t has the sameSize,Symbol,Alignment requirements. This "Integer type" is defined by the compiler. this means that the result of wchar_t (0x8000> 1) is dependent on the compiler. in VC, if the compilation option "/ZC: wchar_t-" is set, wchar_t is defined as unsigned "unsigned short" by default ". in addition, "signed wchar_t" or "unsigned wchar_t" does not exist in C ++.

2. different from C, wchar_t must be a built-in (build-in) type. I guess this is required for heavy load and template specialization. Let's take a look at the iostream overload type and I will understand it. another topic can be introduced here: Char is neither "signed Char" nor "unsigned char"

3. the standard text of C ++ does not associate wchar_t with Unicode. The standard only requires that wchar_t be able to uniquely encode any character in the complete set of all locale characters of the compiler, although wchar_t is also a unique-code, a maverick compiler has the right to define a set of hexie-code that is completely different from Unicode. However, this hexie-code must be at least numeric and compatible with the number range of tables of the char type. It usually means that the 0-255 value of the hexie-code must be the same as the 0-255 value of the char.

4. As we all know, Windows wchar_t is 16 bits, and Linux wchar_t is 32 bits

5. as of vc8, vc c Runtime Library does not support UTF-8, that is to say setlocale (lc_ctype, "zh_CN.UTF-8") is invalid, setlocale (lc_ctype, "zh_cn.65001") also does not work. after one-step tracking, It is found in getqloc. c, there are the following code

// Verify codePage Validity
If (! Icodepage | icodepage = cp_utf7 | icodepage = cp_utf8 |
! Isvalidcodepage (Word) icodepage ))
Return false;

This code is newly added to vc8, which is not found in vc7. The difference is that vc8 fails when setlocale is used, and vc7 fails only when mbstowcs functions are used.

Http://hi.baidu.com/bbcallen/blog/item/e2e37b1b5add59d3ac6e7549.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.