in C + + programming, we often deal with nothing more than the editor and compiler, to the editor said, we often encounter is garbled problem, such as Chinese note display or can not save, the solution is to save your file as Unicode (UTF8). for the compiler, the encoding depends on its support for the C + + standard, such as C + + 11, strings we can only be designated as 2: one is MBCS, such as char* p= "abc haha", there is a UCS2, such as Wchar_t*p = L "abc haha", This way the compiler knows the type of string you want to represent. After c++11, the standard added UTF8 and UCS4 support, such as char* p=u8 "abc haha" means utf8,wchar_t* p=u "abc haha" means UCS2 (actually the same as L "XXXX"),char32_t* p=u "abc haha" means UCS4. This is to distinguish between the compile period and the runtime, although c++11 before the compiler we can not tell the compiler we this constant string is the UTF8 format, but the program runtime we could still use all the coded(MBCS/UTF8/UCS2/UCS4), because these are eventually binary streams in memory.In addition C++11 also added UTF8, UCS2, UCS4 mutual transcoding support:
Std::codecvt_utf8 |
Encapsulates UTF8-related encoding conversions |
Std::codecvt_utf16 |
Encapsulates UCS2-related encoding conversions |
Std::codecvt_utf8_utf16 |
Encapsulates the encoding conversion of UTF8 and UCS2 |
for C + + cross-platform development, we often encounter the default with that encoding, we will find that the Windows UCS2 solution is heterogeneous for other platforms, generally there are 2 ways to solve the problem:one is unified with UTF8, but this is a bit of a hassle for Windows, because the Windows API is UCS2, so this means that any string will go from UTF8 to UCS2 before passing it to the Windows API; Define macro, Windows on the string-related macros are all defined as UCS2, the other platform is all defined as UTF8, this method requires you to write code, the mind should be more sober, because the same code on different platforms encoding format is not the same. always curious, who knows why Windows doesn't have to be UTF8, to make it different from other platforms? because the NT kernel uses UCS2, which was 89, UTF8 was invented 92 years ago. http://www.cppblog.com/weiym/archive/2015/07/25/211370.html
After c++11, the source code has been added support for UTF8 and UCS4 (Unicode is used internally for Windows, because the NT kernel uses UCS2, which is 89, UTF8 was invented by the year 92)