Author: zyl910
The C ++ standard provides a complete international text processing mechanism for the c ++ standard Io library. However, in actual use, it is found that there is a large difference in the support of various compilers, and many times it is impossible to correctly output characters. So I conducted an in-depth investigation.
1. Description 1.1 Test Procedure
The following is a simple program that uses cout, wcout, and printf to output strings. The specific code is --
# Include <stdio. h> # include <locale. h> # include <wchar. h >#include <string >#include <iostream> using namespace STD; const char * PSA = "a Chinese character ABC"; const wchar_t * psw = l "W Chinese Character ABC "; int main (INT argc, char * argv []) {// init. // IOs: sync_with_stdio (false); // Linux GCC. locale: Global (locale (""); // setlocale (lc_ctype, ""); // mingw GCC. wcout. imbue (locale (""); // C ++ cout <PSA; cout. clear (); cout <Endl; wcout <psw; wcout. clear (); wcout <Endl; // C printf ("\ NC: \ n"); printf ("\ t % s \ n", PSA ); printf ("\ t % ls \ n", psw); Return 0 ;}
Let's guess what the running result of this program is?
1.2 theoretical results
First, according to the C ++ standard, analyze the theoretical results of this program.
In the main function, the two lines of code are executed to initialize the region environment --
locale::global(locale("")); wcout.imbue(locale(""));
Details --
1. locale (""): Call the constructor to create a local. The Null String has special meanings: use the default locale in the customer environment (C ++ standard library-self-repair Tutorial and reference manual p697 ). For example, in a simplified Chinese system, locale of Simplified Chinese is returned.
2. locale: Global (locale (""): Set "Global locale of C ++ standard Io library" to "Default locale in customer environment ". Note that it will also set the locale environment of the C standard library, resulting in ,"") "similar effect (" C ++ standard library-self-repair Tutorial and Reference Manual "p698 ).
3. wcout. imbue (locale (""): enables wcout to use the "Default locale in the customer environment ".
In this way, the C standard library and C ++ standard Io Library (especially wcout) correctly set the regional environment, which exactly matches the default environment in the customer environment.
Then, use the cout and wcout of the c ++ standard Io library to output the narrow string and wide string respectively --
// C++ cout << psa; cout.clear(); cout<<endl; wcout << psw; wcout.clear(); wcout<<endl;
Details --
1. Call the clear member functions of cout and wcout to clear the error status and enable subsequent output to run normally.
2. When "cout <Endl" or "wcout <Endl" is used, not only will the output Text wrap be executed, but also the flush member function will be executed to submit data in the buffer. So that the output texts of cout and wcout do not conflict.
Finally, use the printf function of the C standard library to output narrow strings and wide strings --
// C printf("\nC:\n"); printf("\t%s\n", psa); printf("\t%ls\n", psw);
Therefore, the running result of the test program should be --
A Chinese Character abcw Chinese Character ABCC: a Chinese character abc w Chinese Character ABC
Note: To better differentiate the output results of the C ++ standard Io Library and the C standard library, a Tab character is added to printf.
Ii. Test vc2005
Vc2005 is the first compiler in the VC series that has good support for the c ++ 03 standard. We will test it first.
2.1 debug
Compile the test program in debug mode in vc2005. The execution result is --
AWC: a Chinese character abc w Chinese Character ABC
It can be seen that both cout and wcout of C ++ cannot output Chinese characters normally.
C's printf can normally output narrow strings and wide strings containing Chinese characters.
2.2 release
Change the compilation configuration to the "release" mode and then compile and run the program. This is a magic thing. The execution result is --
A Chinese Character abcw Chinese Character ABCC: a Chinese character abc w Chinese Character ABC
All data passes in the release version, and both cout, wcout, and printf can be output normally.
Iii. Test vc2008 and later versions of VC
Compile the test program in vc2008 and the execution result is --
A Chinese Character abcw Chinese Character ABCC: a Chinese character abc w Chinese Character ABC
All pass, cout, wcout, and printf can be output normally. Then we tested the release version, and all of them passed. It seems that the vc2005 bug has been fixed.
Then we tested vc2010 and vc2012, all of which passed the test.
4. Test mingw4.1 in Windows
Use GCC 4.6.2 (mingw (20120426) to compile the test program. The execution result is --
A Chinese Character abcwc: a Chinese character ABC W
The narrow string can be output normally, but the wide string cannot be output normally.
4.2 modify the code so that mingw can be properly displayed
Add a line of initialization code --
// init. locale::global(locale("")); setlocale(LC_CTYPE, ""); // MinGW gcc. wcout.imbue(locale(""));
Use mingw to compile and run the command. The execution result is --
A Chinese Character abcw Chinese Character ABCC: a Chinese character abc w Chinese Character ABC
All passed, cout, wcout, and printf can be output normally. It seems that "locale: Global (locale (" ")" In mingw does not set "setlocale (lc_all," ")" and must be called manually.
Compiling the modified Code with vc2008 is also successful. One call to "setlocale (lc_all," ")" will not cause damage.
5. Test gcc5.1 in Linux
Use GCC in linxu to compile the test program. The execution result is --
A Chinese Character abcwiwabcc: a Chinese character abc w Chinese Character ABC
Both cout and printf can be output normally, but wcout cannot.
5.2 modify the code so that the code can be properly displayed in Linux
Add a line of initialization code --
// init. ios::sync_with_stdio(false); // Linux gcc. locale::global(locale("")); wcout.imbue(locale(""));
Use GCC to compile and run. The execution result is --
A Chinese Character abcw Chinese Character ABCC: a Chinese character abc w Chinese Character ABC
All passed, cout, wcout, and printf can be output normally.
5.3 modify the code 2nd times so that mingw can be properly displayed
Switch back to Windows and use mingw to compile the modified Code. The execution result is --
A Chinese Character ABCC: a Chinese character ABC W
The wide string cannot be properly displayed.
Based on the previous experience, add "setlocale" to the initialization code "--
// init. ios::sync_with_stdio(false); // Linux gcc. locale::global(locale("")); setlocale(LC_CTYPE, ""); // MinGW gcc. wcout.imbue(locale(""));
Use mingw to compile and run the command. The execution result is --
A Chinese Character abcw Chinese Character ABCC: a Chinese character abc w Chinese Character ABC
Finally all passed.
5.4 test the code modification 2nd times in Linux
In Linux, the code is successfully modified 2nd times.
The modified code is compiled by vc2008.
It seems that the effective initialization methods in VC, mingw, and Linux are finally found. Unfortunately, manual synchronization is required after "iOS: sync_with_stdio (false)" is disabled, which may cause some old code to work abnormally and this method is not very practical.
6. Test GCC under Mac OSX
Use GCC in linxu to compile the test program. The execution result is --
An error is reported when such a simple program runs. Why?
Use GDB to debug the program. R run, where display call stack, list display source code --
It can be seen that an error is reported when "locale (" ")" is executed.
Isn't "locale (" ")" specified in the C ++ standard? How can I connect to it and report an error?
I searched the internet and found someone had checked the GCC source code under Mac. It explicitly wrote "Currently, the generic model only supports the" c "locale ."--
Http://stackoverflow.com/questions/1745045/stdlocale-breakage-on-macos-10-6-with-lang-en-us-utf-8
STD: locale Breakage on MACOs 10.6 With lang = en_US.UTF-8
VII. Summary
Although the C ++ standard concept is perfect, it is a pity that there are many differences in the degree of implementation of various compilers. Even some platforms do not support "locale.
To ensure cross-platform use, use the C ++ standard Io library with caution. It is best to use the C standard library with excellent compatibility as much as possible.
References --
ISO/IEC 9899: 1999 (c99). ISO/IEC, 1999. www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf
C ++ international standard-iso iec 14882 Second Edition 2003 (C ++ 03). ISO/IEC, 2003-10-15.
"C ++ standard library-self-repair Tutorial and reference manual". By niclai M. josutis, translated by Hou Jie and Meng Yan. Huazhong University of Science and Technology Press, 2002-09.
STD: locale Breakage on MACOs 10.6 With lang = en_US.UTF-8. http://stackoverflow.com/questions/1745045/stdlocale-breakage-on-macos-10-6-with-lang-en-us-utf-8
[C] cross-platform use of tchar-so that Linux and other platforms also support tchar. h, to solve the problem of cross-platform format control characters, multi-language display at the same time ". http://www.cnblogs.com/zyl910/archive/2013/01/17/tcharall.html
Download source code --
Http://files.cnblogs.com/zyl910/wchar_crtbug.rar