[C/C ++] wide characters and console programs

Source: Internet
Author: User

At the beginningC/C ++When the character type isChar. ContactWin32After programmingWchar_tAnd then write it to the console.ProgramYou can useWchar_t. However, using wide characters in the console may cause various strange problems, mainly in the output. Next I will share my experiences in this regard.

 

First, let's take a look at this section.Code:

# Include <Stdio. h>

IntMain (){

Wprintf (L"% S", L"Blog");

Return0;

}

 

WprintfIt is used to output a string of the wide character type and does not appear to be an error. However, the output of this Code contains three question marks. This is usedWprintfIs the most typical problem. The solution is to add_ Wsetlocale:

# Include <Stdio. h>

# Include <Locale. h>

 

IntMain (){

_ Wsetlocale (lc_all, l"CHS");

Wprintf (L"% S", L"Blog");

Return0;

}

 

_ WsetlocaleYesSetlocaleThe difference between the two functions is that the return value and whether the second parameter uses a wide character string, and the execution effect is the same.

 

To explain this code, you must first start with the console itself. Character sets are used in all scenarios involving character processing, while the console is a character environment. Therefore, the console also needs to use character sets. The character set used is called the code page, each code page corresponds to a natural language, which defines how the characters in the language are associated with binary code. For example, the code page indicating English is437Indicates that the code page for simplified Chinese is936. A console window can only have one active code page, so the characters in different languages cannot appear in one console window at the same time, unless the characters are both common and have the same binary code. You can useChcpCommand to change the code page used in the current Console window.

 

The code page is actually a multi-byte character set, so the Console does not supportUnicode. Therefore, if the width character is output directly to the console, it will not be correctly displayed. The width character must be converted into multi-byte characters before output. WhileWprintfThe function does perform this conversion internally. You can tryWprintfWhen the function is executed in one step, the execution process is finally reached.Wcstombs_s.

 

The problem occurs in the conversion process. The conversion function must know which code page to convert the binary code of a wide character to the binary code of a code page character. If the selected code page does not match the active code page of the console, it will not be displayed correctly. The first piece of code above is because no proper code page is selected, resulting in an output error. In the second code, the area is set to China, telling the Conversion Function to convert the width936The multi-byte character of the code page, which is consistent with the active code page on the console, so it can be output correctly.

 

Here is a brief introduction _ Wsetlocale Function. Set this function C The regional culture used by the Runtime Library. Regional culture affects the display formats of numbers, currencies, time, and other values. Of course, there is also a code page. The first parameter indicates which aspect of the regional culture to use. The value can be Lc_collate , Lc_ctype , Lc_monetary , Lc_numeric , Lc_time And Lc_all . For example Lc_numeric , Then C When the runtime outputs a number, the digital display style of the specified region culture is used. Lc_ctype Only the code page selected by the conversion function is affected.

 

The second parameter specifies the regional culture through a string. The string has a fixed format. For details, seeMsdnDocumentation. But in general, we only need to use the abbreviation of the country or region, for example,"CHS". If an empty string "" is used, the corresponding code page is selected based on the region settings of the current operating system. Therefore, if the region selected by the operating system is "Chinese (China )",_ Wsetlocale (lc_all, "")To set the correct code page.

 

CBy default, a Runtime Library named"C"Regional culture, which is language-independent and universal internationally. Its associated code page only containsASCII. When the program is startedCThe Runtime Library willSetlocale (lc_all, "C ").Setlocale, So by defaultWprintfThe specified string cannot contain Chinese characters.

 

 

C This is how the language outputs wide characters. Next let's take a look at C ++ process the output of wide characters. _ wsetlocale only for C the Runtime Library is valid for cout and wcout is unaffected.. For cout and wcout imbue :

STD: wcout. imbue (STD: locale ("CHS", STD: locale: All ));

LocaleTwo Parameters of the object constructor and_ WsetlocaleFunction parameters have the same meaning, but the position is adjusted.

 

AndWprintfSame,WcoutWhen outputting a wide character string, it is also first converted into a multi-byte character string. The difference is that when a character is not supported on the code page,WprintOutput a question mark, whileWcoutNo output.BadbitAndFailbitAnd all subsequent outputs are invalid. I personally thinkWcoutBecause not all occasions are suitable for such processing, orWprintfIs more common.

 

Based on the above discussion, we must be very careful when writing the console program to handle Input and Output issues, to ensure that the program output is correct.

(In the content posted for the first time in this article, it is recommended that you use the multi-Byte Character Set instead of the Unicode Character Set for the console program. This is an obvious suggestion, So I deleted this section .)

 

 

 

FinallyChar *Convert stringWchar_t *Comment on the string method. The code for this method is as follows:

# Include <Iostream>

# Include <Sstream>

 

Using NamespaceSTD;

 

IntMain (){

Wostringstream outstrstream;

Outstrstream <"Blog";

Wstring wstr = outstrstream. STR ();

Wcout <wstr <Endl;

}

 

The specific train of thought is:Char *Type string outputWostringstreamObject, and thenStrMethod to obtain the converted string. This method makes assumptions:WostringstreamThe object will automaticallyChar *Convert stringWchar_t *Type string. Note that this code does not callWcout. imbuMethod To set the regional culture, but still can output Chinese correctly.

 

Compilation and execution of this Code are all correct. However, if you try to obtain the length of the converted string, the following error occurs:

# Include <Iostream>

# Include <Sstream>

 

Using NamespaceSTD;

 

IntMain (){

Wostringstream outstrstream;

Outstrstream <"Blog";

Wstring wstr = outstrstream. STR ();

Wcout <wstr. Length () <Endl;

}

 

This program will output6Instead3. In addition to the length, useAtThe character obtained by the method is not one of the "blog. In fact, the results of operations on this string are almost all incorrect.

 

Why is this happening? You can observeOutstrstreamInternal data of the object to find the answer. YesOutstrstream <"Blog"Memory Data after:

 

 

In the red boxOutstrstreamData in the object. Let's take a look at the actual data in the memory of the "blog" string of the wide character and multi-byte characters:

# Include <Iostream>

# Include <Sstream>

 

Using NamespaceSTD;

 

IntMain (){

Char* Pstr ="Blog";

Wchar_t* Pwstr = L"Blog";

}

 

 

The figure above isWchat_t *Type. The following figure showsChar *Type. You can see these imagesOutstrstreamThe string in the object is still a multi-byte character string, but each byte is extended into two bytes. This is not a string of the wide character type at all, so even if you do not callWcout. imbueIt can also output Chinese characters correctly.

 

Write it here. The above content is my personal opinion. If there are mistakes and omissions, please forgive me.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.