We know that the Calabel string uses the UTF8 encoding,
Usually under Xcode, direct calabel *p = ...; P->settext ("kanji"), the display of Chinese characters is not any problem,
However, in the VS2013, it shows garbled characters.
A lot of people ask in the group, I answer: please use UTF8 code.
He said, my file is in the UTF8 format AH. Oh, it's not that simple okay, file encoding and string literal encoding are two concepts.
A pre-compiled directive for VS2013
#pragma execution_character_set ("Utf-8")
This can be safely in the VS2013 inside P->settext ("kanji"), P->settext ("Chinese")
Since it's just a precompiled directive, it's generally a good thing to put this sentence behind the include group.
VC2010 of C + + source strings in Chinese characters
To figure this out, you need to understand the coding first. But the coding problem is so complicated, it's definitely not open.
I first find an example, such as: "Chinese" Unicode code point/UTF8 encoding/GBK is how much.
First go to this website, enter "Chinese" query the corresponding Unicode code point/UTF8 Code: http://www.mytju.com/classcode/tools/encode_utf8.asp
The code points for Unicode are (decimal): Medium (20013), text (25991). The corresponding UTF8 codes are (16 binary): Medium (E4B8AD), text (E69687).
Then go to the following site, enter the "Chinese" query corresponding GBK code: http://www.mytju.com/classcode/tools/encode_gb2312.asp
GBK encoding 16 binary (GBK) are: Medium (d6d0), Wen (CEC4).
Now we know the UTF8 of "Chinese" and the specific value of GBK encoding. Let's see what VC2010 is dealing with.
1. See the UTF8 Coded code (UTF8_NO_BOM.CPP) without BOM first
UTF8 no bom//file contains characters that cannot be represented in the current code page (936) # include <stdio.h>int main () {const char* str = "Chinese"; for (int i = 0; i < sizeof (str); ++i) {printf ("0x%x", Str[i]&0xff); } return 0; Output://0xe4 0xb8 0xad 0xe6}
The output is: 0xe4 0xb8 0xad 0xe6. It feels like it's right.
But don't worry: VC compiles a warning message: UTF8nobom.cpp:warning C4819: The file contains characters that cannot be represented in the current code page (936). Save the file in Unicode format to prevent data loss.
The subtext is that you have this code with characters that GBK cannot represent, please save it in Unicode mode. VC simply did not put the code (UTF8nobom.cpp) as UTF8,VC just treat it as GBK.
So why did you output the correct results?
Because VC takes (UTF8nobom.cpp) as GBK, and compile time also to convert to local code (also GBK). Therefore, UTF8 encoded "Chinese", by VC as encoded as "0xe4 0xb8 0xad 0xe6" other Chinese processing. VC already do not know "0xe4 0xb8 0xad 0xe6" is the corresponding "Chinese" literal value.
However, in the process of GBK (actually UTF8 without BOM), some UTF8 encoded characters were found to be not a reasonable way of GBK expression, so the C4819 compiler warning appeared.
2. See how the UTF8 with the BOM is handled (UTF8_WITH_BOM.CPP)
UTF8 with bom#include <stdio.h>int main () {const char* str = "Chinese"; for (int i = 0; i < sizeof (str); ++i) {printf ("0x%x", Str[i]&0xff); } return 0; Output://0xd6 0xd0 0xce 0xc4}
The compilation has no warning, but there is a problem with the output: 0xd6 0xd0 0xce 0xc4.
The source file is clearly UTF8 encoded format "0xe4 0xb8 0xad 0xe6", How to become "0xd6 0xd0 0xce 0xc4" (This is GBK code)?
This is the VC privately do good: it clever to convert UTF8 source code to GBK processing!
Why does VC have to do such a foolish thing?
The reason is to be compatible with the old VC version. Because the previous VC can not handle the UTF8, are processed with local encoding.
3. To see how the real GBK is handled (gbk.cpp)
Gbk#include <stdio.h>int Main () {const char* str = "Chinese"; for (int i = 0; i < sizeof (str); ++i) {printf ("0x%x", Str[i]&0xff); } return 0; Output://0xd6 0xd0 0xce 0xc4}
Without compilation errors, the output is also consistent with the source code: "0xd6 0xd0 0xce 0xc4".
Because the source file is gbk,cl at compile time GBK conversion to GBK, does not change the string.
But now a lot of people don't want to use GBK (because they can only be used in China, they can't represent global characters).
Here, you can briefly summarize:
VC Editor and VC compiler is 2 concepts, VC Editor support UTF8 does not mean that the VC compiler also support UTF8
VC editor starting from 2008? Support for UTF8 with BOM (temporary without BOM, because local encoding conflicts)
VC compiler starting from 2010 important can support UTF8 (although the support style is very not elegant)
4. See how VC2010 handles UTF8 with BOM (utf8_with_bom_2010.cpp)
VC2010 important added UTF8 support for compilation (#pragma execution_character_set ("Utf-8")), see:
Http://social.msdn.microsoft.com/Forums/en-US/vcgeneral/thread/2f328917-4e99-40be-adfa-35cc17c9cdec
UTF8 with BOM (VC2010), this sentence is the focus! #pragma execution_character_set ("Utf-8") #include <stdio.h>int main () {const char* str = "Chinese"; for (int i = 0; i < sizeof (str); ++i) {printf ("0x%x", Str[i]&0xff); } return 0; Output://0xe4 0xb8 0xad 0xe6}
Without compilation errors, the output is also consistent with the source code: "0xe4 0xb8 0xad 0xe6".
UTF8 encoding, UTF8 output. Perfect!
VC editor starting from 2008? Support for UTF8 with BOM (temporary without BOM, because local code conflicts)---Cause:
Microsoft uses the BOM in UTF-8 because it separates the UTF-8 from the ASCII encoding. BOM byte sequence in utf-8 is actually meaningless, then UTF-16 or UTF-32 inside has meaning.
Reprint: https://m.oschina.net/blog/161676
http://www.zhihu.com/question/20167122
File encoding and string literal encoding within a file are two concepts