C + + output Chinese characters (RPM)

Source: Internet
Author: User

C + + output Chinese characters

1. cout


Scenario 1: the const char* str = "Chinese" is defined in the source file on the VC + + compiler, because the Windows environment is encoded with GBK, so the string "Chinese" is saved as GBK inside the code,
The compiler also points str to a read-only memory space that contains GBK encoding.
With cout output str, because the Chinese Windows environment with GBK encoding, so the GBK encoded STR content output to the console, no problem.

Scenario 2: Edit a file under Linux const char* str = "Chinese", because Linux is commonly used UTF8 encoding, so in the source file, "Chinese" is saved as UTF8 inside the code.
Then open this source file in Windows, because Windows uses GBK encoding, so VC + + according to GBK to explain is saved as UTF8 inside the "Chinese", displayed as garbled.

2. Wcout

Const wchar_t* str = L "Chinese" is defined in the source file on the VC + + compiler, because L is specified, the string "Chinese" is saved as a Unicode inner code (UCS2), and the compiler points STR to a read-only memory space that contains Unicode encoding.
When using Wcout output str, wcout first Call wcstomb_s () (that is, according to the current local conversion, if not set local, it is the classic C local, do not know Chinese) the contents of the Str converted to the console, the result is nothing to display. (Debug code can know VC + + 2010 Implementation is a character output, call wctomb_s)

Principle
We know that cout and wcout are special versions of Basic_ostream, and Basic_ostream calls Basic_streambuf to actually perform output actions, and for wchar_t,basic_streambuf there is a special specialization function, Call FPUTWC to output a wide character, and FPUTWC needs to call wctomb_s to convert the wide character to output. We know that wctomb_s is dependent on locale, because by default it is C locale, so calling wctomb_s in Chinese code will fail.

Solutions
Set the locale of the current system to replace the default "C" locale, so that functions such as wctomb_s work correctly.
Any of the following 3 methods can be used to achieve the goal.

1. The C function sets the global locale
SetLocale (Lc_all, "");

2. C + + set global locale
Std::locale::global (Std::locale (""));

2. Set a locale separately for wcout
Std::locale Loc ("");
Std::wcout.imbue (Loc);

Conclusion
Unlike the Windows API, the various W versions of the classes or functions in C + + do not improve performance because they all need to be in WC. To.. A function such as MB is converted to ANSI-compatible encoding and then calls the standard library function. Alternatively, for Windows systems, a wide-character FPUTWC can call the Unicode version of the Windows API directly without conversion, if the library function's implementation is willing to do so. But these are all in C + + The language itself has nothing to do with it. Because the Windows kernel is Unicode, invoking the Windows API directly with a Unicode string can be a little bit of a benefit.

C + + designer's starting point: I don't care what character you use to encode, not C + +, to output: if it is a single-byte character or multibyte character, direct output, if it is a wide character, then the local conversion to multibyte characters, and then output.
Even if the future Unicode is outdated (assuming, assuming), it doesn't matter, just define the new local. This is true for C as well.

Windows Designer's starting point: Use Unicode wide characters uniformly to solve all problems

Original: http://blog.csdn.net/gonxi/article/details/5931006

Multi-byte characters and the output of wide characters in C + +

Using the iostream of the C + + standard library, it is easy to handle the console, files, strings, and other extensible external representations as streams, but there are many problems in dealing with Chinese. I originally did not how to use this iostream, these days try to write something, a while can not output Chinese, a while do not support the Chinese file name, make a big head. Search on the Internet, no solution is found for all situations. However, after many tests of their own, the basic solution to these problems, now written as a summary, but also for those who encounter the same problem reference. The paper also discusses the Chinese output of printf and wprintf in C language.

It should be explained that my development environment is VS 2005 (the standard library of course is also implemented by Microsoft), does not guarantee that the other environment is the same effect.
1, cout and Wcout
Under the default C locale, cout can output Chinese directly, but not for wcout. For wcout, you need to set its locale to the local language in order to output Chinese:
Wcout.imbue (Locale (), "", Lc_ctype)); ①
Others use the following statement, but this will change all locale settings for wcout, such as the number "1234" will be output as "1,234".
Wcout.imbue (Locale (""));

2, Ofstream and Wofstream
In the default C locale, Ofstream can correctly output Chinese to the file, but does not support Chinese file names, Wofstream supports Chinese file names, but cannot output Chinese to the file. To resolve this problem, you need to set the global locale to the local language before opening the file. After setting the global locale to the local language, both the Ofstream and wofstream issues are resolved, but cout and wcout cannot output Chinese. For cout and wcout to output Chinese, you need to restore the global locale to the original settings as follows:
Locale &loc=locale::global (locale (), "", Lc_ctype)); Ii
Ofstream ofs ("OFS test. txt");
Wofstream wofs (L "wofs test. txt");
Locale::global (Loc); ③
ofs<< "Test" <<1234<<endl;
Wofs<<l "Another test or testing" <<1234<<endl;

3. printf and wprintf
Plus the two-bit C-language man, the problem is more complicated. Consider the following statement (note the case of s):
printf ("%s", "multibyte Chinese/n"); ④
printf ("%s", L "Unicode Chinese/n"); ⑤
wprintf (L "%s", "multibyte Chinese/n"); ⑥
wprintf (L "%s", L "Unicode Chinese/n"); ⑦
By default, ⑤, ⑦ Two statements cannot output Chinese, and the strings in these two statements are in Unicode form. If you precede all output statements with the following statement, set the global locale of the C language to the local language (only the global locale in the C language) can output normally:
SetLocale (Lc_ctype, ""); ⑧
However, this causes cout and wcout not to output Chinese, and after the global locale of the C language is restored cout and wcout are normal, as follows:
SetLocale (Lc_ctype, "C"); ⑨
However, after the restore, printf and wprintf output Unicode text is not normal (output multibyte text is always normal). You can't just set it up once for every printf/wprintf you write, and then restore it again? Therefore, it is recommended not to mix iostream and printf/wprintf, really to mix, then let printf/wprintf only output multibyte string, so do not need to call setlocale (), it will not affect cout and wcout.

Summarize
In short, with iostream, printf/wprintf output Chinese, a little trouble. Summarize the following points:
If you want to use wcout, you need to set its locale to the local language before you use it by ① the statement.
If you want to use Ofstream or wofstream, press the statement ② before opening the file to set the global locale to the local language and save the initial global locale. Then, after opening the file, press the statement ③ to restore the global locale to its initial value;
Don't mix iostream and printf/wprintf. If you want to mix, only use printf/wprintf output multibyte string;
When using printf/wprintf alone, if you want to output a Unicode string, you need to set the global locale for the C language ⑧ by statement. If you only output the multibyte string, you do not need to set it.

Finally add reprint (webmaster) A Little words:
   a program, generally does not use two kinds of string,  either with multibyte string,  or with a wide string .  so, the problem is actually very simple,  no author said so complicated.   Even if you sometimes need to convert,  there are specialized functions (for example, multibyte-character versions of programs that use COM components,  com components require a wide string .  can take advantage of  _bstr_t, cstring):

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.