#include <iostream> #include <string> #include <locale> #include <codecvt> #include <fstream> int main (int argc, char *argv[]) { std::wstring str = l "123, who am I? I love the Diaoyu Islands! "; std::wstring_convert<std::codecvt_utf8<wchar_t>> Conv; std::string narrowstr = conv.to_bytes (str); { Std::ofstream ofs ("C:\\Test.txt"); OFS << narrowstr; } Std::wstring widestr = conv.from_bytes (NARROWSTR); { Std::locale::global (Std::locale ("chinese-simplified")); Std::wofstream ofs (L "C:\\testw.txt"); OFS << widestr; } }
Http://zh.cppreference.com/w/cpp/locale/codecvt_utf8
Previously thought that the standard library's wofstream can only output the MBCS encoded text file, today after contacting the CODECVT, knew oneself completely wrong.
After a morning study, the following results were obtained (both on MSDN and experimentally). Note: The following code and instructions are subject to VS2010.
First say CODECVT header file (not found in GCC), this is the explanation of MSDN: Http://msdn.microsoft.com/zh-cn/library/ee292114.aspx
It contains three classes: Codecvt_utf8, Codecvt_utf8_utf16, CODECVT_UTF16, and an enumeration type Codecvt_mode.
CODECVT is a class used for different text encoding transformations, and Codecvt_utfx inherits this class and implements the functions of different encoding conversions.
CODECVT is used in conjunction with locale to achieve output, read UTF-8 and UTF-16 encoded text files.
For example UTF-8:
#include <iostream> #include <codecvt> #include <fstream> #include <string> int main (void) { using namespace std; Auto Locutf8=locale (Locale (""), new codecvt_utf8<wchar_t>); Wofstream Wfo (L "Hello.txt"); Wfo.imbue (LocUtf8); WFO << L "This is Utf-8 encoded text file! "; Wfo.close (); Wifstream WFI (L "Hello.txt"); Wstring wstr; Wfi.imbue (LocUtf8); WFI >> wstr; Wcout.imbue (Locale ("")); Wcout << wstr << Endl; System ("PAUSE");}
Static Auto Locutf8=locale (locale (""), new codecvt_utf8<wchar_t>); Instantiate a static locale with Codecvt_utf8. Codecvt_utf8<wchar_t> indicates that the wchar_t is converted to UTF-8 encoding when the output is entered. As for new, unlike buffer, delete is automatically managed by locale. Wfo.imbue (LocUtf8), which means that WFO uses LOCUTF8 encoding when outputting files, Wfi.imbue (LocUtf8) is similar. Be careful not to use Locale::global, which will affect Wcout. After the program is run, the Hello.txt file is generated, its encoding is opened with Notepad, the point is saved as, and its default encoding is file encoding.
UTF-8 is a special coding format that recognizes the need for a BOM. If you change the UTF8 of the above example directly to UTF16 and run with the result, but with Notepad turned garbled, UTF-16 need the BOM header to identify the encoding format. As follows:
#include <iostream> #include <codecvt> #include <fstream> #include <string> int main (void) { using namespace std; Auto Locutf16=locale (Locale (""), new codecvt_utf16<wchar_t, 1114111UL, generate_header>); Wofstream Wfo (L "Hello.txt"); Wfo.imbue (LOCUTF16); WFO << L "This is Utf-16 encoded text file! "; Wfo.close (); Wifstream WFI (L "Hello.txt"); Wstring wstr; Wfi.imbue (LOCUTF16); WFI >> wstr; Wcout.imbue (Locale ("")); Wcout << wstr << Endl; System ("PAUSE");}
The difference is in this sentence:
new codecvt_utf16<wchar_t, 0X10FFFF, generate_header>。 Look at the definition of CODECVT_UTF16:
template< class Elem, // the original encoding format required for conversion, this is a long0x10ffff, // Maximum number of characters, unknown use, copy default codecvt_mode mode = (Codecvt_mode)0// one codecvt_mode enum type class CODECVT_UTF16: > publicchar, statetype>
Look again
4, // auto-Confirm BOM header, check when reading file. But after trying to invalid, I may have incorrect usage (the text does not come in, but a newline character)2, // Auto Output BOM header, check the output file. 1 // using little endian encoding (default big endian, specific explanation view encyclopedia)}; multiple Parameters using | (bit or) connection, the result is cast to Codecvt_ Mode type. }
For example little endian encoding + automatic output BOM Header declaration is as follows:
New 0x10ffff, Codecvt_mode (Generate_header | little_endian) >
It is recommended to use the default big endian encoding, little endian output line break is a box note: Big endian encoded files cannot be little endian, and vice versa. Should be able to use Consume_header automatic recognition, but the test is invalid.
C + + STL Std::wstring_convert processing UTF8