C ++ outputs Chinese Characters

Source: Internet
Author: User

1. wprintf
Q: sizeof (wchar_t) =?
A: varies with the compiler. (So do not use wchar_t when cross-platform is required.) VC: sizeof (wchar_t) = 2;

Q: Why is there no result in directly using wprintf (L "test 1234") in VC?
A: locale is not set.

Setlocale (lc_all,
"
CHS
"
);
Wprintf (L
"
% S
"
, L
"
Test 1234
"
);

Or (assume that the current active codePage is CHS)

Char
SCP [
16
];

Int
CP
=
Getacp ();
Sprintf (SCP,
"
. % D
"
, CP );
Setlocale (lc_all, SCP );
Wprintf (L
"
Test 1234
"
);

2. wcout
Same, but set locale, use STD: locale

Locale LOC (
"
CHS
"
);
Wcout. imbue (LOC );
Wcout
<
L
"
Test 1234
"

<
Endl;

 

This article should be [netsin

.
Note: wprintf is a standard library function of C, but wcout is not a standard member of C ++, and l in C ++ "...... "It is a wide character, but not necessarily a Unicode character, which is related to the compiler implementation.
[Qian Kun smile

]
Why is l "XX" defined by C/C ++ language determined? This is obviously for the universality and portability of C/C ++. Bjarne
In the opinion, the C ++ method is to allow programmers to use any character set as the string character type. In addition, Unicode
It is not clear whether encoding has developed several versions and whether it can be used permanently. For more information about Unicode and comparison with other character sets, I recommend that you read "no-nonsense XML".

The execution environment of the following two pieces of code is Windows XP Professional in English, and the compiler is vs2005rtm.

// C
# Include <stdio. h>
# Include <locale. h>
Int main (void)
{
Setlocale (lc_all, "CHS ");
// Setlocale (lc_all, "Chinese-simplified ");
// Setlocale (lc_all, "Zhi ");
// Setlocale (lc_all, ". 936 ");
Wprintf (L "China ");

Return 0;
}

// C ++
# Include <iostream>
# Include <locale>
Using namespace STD;
Int main (void)
{
Locale LOC ("CHS ");
// Locale LOC ("Chinese-simplified ");
// Locale LOC ("Zhi ");
// Locale LOC (". 936 ");
Wcout. imbue (LOC );
STD: wcout <L "China" <Endl;

Return 0;
}

Note: Do not mix setlocale and STD: locale.

-------------------------


-------------------------

"VC knowledge base" code: 56 43 D6 AA ca B6 BF E2 00 // ANSI code
L "VC knowledge base" encoded in VC ++: 56 00 43 00 E5 77 C6 8B 93 5E 00 00 // (Unicode in Windows) Encoding
L "VC knowledge base" encoded in GCC (Dev-CPP4990): 56 00 43 00 D6 00 AA 00 ca 00 B6 00 BF 00 E2 00 00 00 // simply add 0 to the ANSI Encoding
L "VC knowledge base" failed to compile in GCC (Dev-CPP4992), reported illegal byte sequence

L "VC knowledge base" solution steps in Dev-CPP4992:
A. Save the file as UTF-8 encoded // UTF-8 is one of Unicode, but it is different from (Unicode in Windows)
B. Remove the BOM header: Use a binary Editor (such as Vc) to remove the first three bytes of the UTF-8 file. // Linux/Unix does not use Bom.
C. Use gcc/g ++ for compiling and running

After the above steps, in the dev-cpp4992
"VC knowledge base" encoding: 56 43 E7 9f A5 E8 af 86 E5 Ba 93 00 // UTF-8 encoding. Note that it is no longer ANSI encoding. Therefore, use printf/cout to output garbled characters.
L "VC knowledge base" encoding: 56 00 43 00 E5 77 C6 8B 93 5E 00 00 00 // (Unicode in Windows) Encoding

Supplement: to use wcout and wstring in mingw32, you need to add some macros, such
# DEFINE _ glibcxx_use_wchar_t 1
# Include <iostream>
Int main (void)
{
STD: wcout <1 <STD: Endl;
}
It can be compiled, but it cannot be linked. Google it on the Internet. stlport said that mingw32 is faulty, and mingw32 said that M $'s C Runtime is faulty.

 

 

Unicode output of printf and wprintf on the console

1. printf can only provide ANSI/MB output, and does not support output Unicode stream.
Example: wchar_t test []
=
L
"
Test 1234
"
;
Printf (
"
% S
"
, Test); is not output correctly

2. wprintf also does not provide Unicode output,
However, he will convert the string of wchar_t into the SB/MB character encoding of locale, and then output
Example: wchar_t test []
=
L
"
Test
"
;
Wprintf (L
"
% S
"
, Test); will output ?? 1234 or no output
Because wprintf cannot convert l "test" to the default ANSI, you need to set localesetlocale (lc_all,
"
CHS
"
);
Wchar_t test []
=
L
"
Test
"
;
Wprintf (L
"
% S
"
, Test); there will be correct output
Equivalent to printf ("% ls", test );

To sum up:Crt I/O functions do not provide Unicode output.


3. Window console since NT4 is a real Unicode Console
However, Unicode string is output. Only Windows APIs and writeconsolew are used.
For example:

Wchar_t test []
=
L
"
Test 1234
"
;
DWORD ws;
Writeconsolew (getstdhandle (std_output_handle), test, wcslen (test ),
&
WS, null); correct output without the need to set locale, because it is a real Unicode output, not related to codePage

4. How to implement cross-platform console output
Do not use wchar_t and wprintf because these depend on the compiler.
ICU is a mature cross-platform Unicode-supported libary of IBM.

The following is the uprintf Implementation of ICU:

Void
Uprintf (
Const
Unicodestring
&
Str ){

Char

*
Buf
=

0
;
Int32_t Len
=
Str. Length ();
Int32_t buflen
=
Len
+

16
;
Int32_t actuallen;
Buf
=

New

Char
[Buflen
+

1
];
Actuallen
=
Str. Extract (
0
, Len, Buf
/*
, Buflen
*/
);
//
Default codePage Conversion


Buf [actuallen]
=

0
;
Printf (
"
% S
"
, Buf );
Delete Buf;
} It first converts Unicode string to local codePage, and then printf. Although it is not Unicode output, it works well across platforms. Postscript: mbstowcs (wchar_t * wcstr
, Const char * mbstr
, Size_t count
Count: the maximum number of multibyte characters to convert. refers to the number of characters in the Multi-byte string to be converted plus one from the number of characters in the currently active locale. For example, the string "ABC Zhao 123" for C locale is strlen ("ABC Zhao 123"), that is, 8 + 1. For chinese-simplified.936, count is 7 + 1.

The Count calculation must be in the same locale as mbstowcs.

A comment says:

Char SCP [16];
Int CP = getacp ();
Sprintf (SCP, ". % d", CP );
Setlocale (lc_all, SCP );
Wprintf (L "tested 1234 ");
Equivalent
Setlocale (lc_all ,"");
Wprintf (L "tested 1234 ");

Come with a comment that I think is humorous about Unicode support.

Let me give you a lesson. First, according to the Unicode Character Set supported by c99, this refers to the character encoding in the memory. c ++ should certainly support it, which is undoubtedly true. However
External representation of Unicode, such as utf7, utf8, utf8n, utf16le, utf16be, and other external storage formats. The C ++ Committee apparently has no reason to ignore these formats,
As the world's most handsome STL library, the I/O Stream Library outside the sgi-stl3.3 thinks that you are not obligated to implement a variety of character encoding conversion. In fact, this will cause the STL-io library to operate on the system.
A strict distinction between systems will inevitably result in a group of children setting up a wolf. Therefore, the SGI-STL statement ignores all except the standard "C. The result is that if we use SGI-STL and
Do not provide new implementation of Windows-based c-locale underlying interfaces such as c_local_stub_win32.cxx. What is used like goodname?
Locale does not work. Speaking of this, if you are using STL-port, the stl-port4.6 is much more ambitious than SGI-STL, and the child must have been away, And the wolf is about to be absent.
Yes. I have not done any examples of using other locale in stl-port4.6, but I think his performance will be as handsome as Microsoft's STL. Bcb6, cbx
The STL library is exactly a version of STL-port. If the goodname method fails, it is unfortunate that stlport has a wolf.
If
You are using MS's p. j.'s STL, so you are lucky because Microsoft will give you what you want. In general, the goodname code should work. But as Microsoft's motto:
For what you need, but do not understand how I do it. Although Microsoft exposes STL source code, I would rather have never seen it. I am even skeptical about what Microsoft has used to add
1024-bit password. Using Microsoft products is like living in NewYork: in hell and in heaven.
The focus is not mentioned here. In fact, STL-io streams are being processed internally.
The codecvt * object specified in locale is used for mutual conversion between the upstream stream and the external data stream. do_in and do_out are used for conversion according to SGI-STL.
"C" does not consider 10646 when converting internal wchar_t and external char, nor any other form of MBCS. In fact, the C ++ standard does not understand these
Things. So when you want to output Chinese characters, the high positions of each Chinese character are discarded, Which is unpleasant. The solution is to inherit a codecvt to rewrite do_in, and use psdk in do_out.
To convert Unicode to MBCS. You can use the code page to try it on your own.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.