Source: http://dozb.blogchina.com/1655050.html
Translated by: dozb, Nicole
Original Author:Taka Muraoka
Original Source: http://www.codeproject.com/vcpp/stl/upgradingstlappstounicode.asp
Introduction
I recently upgraded a large program to replace the single-byte character with Unicode. Except for a few legacy modules, I faithfully use t-functions and use _ T () macros to wrap my strings and character constants. As we all know, this can be safely converted to Unicode, what I want to do is define Unicode and _ Unicode, and I pray that everything will work as I wish.
Oh, my God, how wrong I am :((
Therefore, I wrote this article to treat the pain of two weeks of work and to relieve others, which I have already suffered. Alas...
Basic
Theoretically, writing code that can be compiled using single-or double-byte characters is straightforward. I used to write a section here, but Chris Maunder has already written done it. The technology he described is widely known, so it is very helpful to understand the content of this article.
Wide file I/O
Here is the wide version of the stream class, which easily defines T-style macros to manage them:
You will use them like this:
tofstream testFile( "test.txt" ) ; testFile << _T("ABC") ;
The expected result is that when the single-byte character is used for compilation, the code will generate a 3-byte file. When the double-byte character is used for compilation, the code will generate a 6-byte file. But you are wrong. All files are 3 bytes.
What's wrong?
This is due to the Standard C ++ specification. When writing a wide streamFile. You must convert double-byte to single-byte. In the preceding example, the wide string l "ABC" (with 6 bytes in length) is converted into a narrow string (3 bytes) before being written to the file ). In worse cases, how to convert is determined by the implementation of the database (Implementation-dependent ).
I can't find a definite explanation of why things are like this. I guess the file is defined as a single-byte stream. If two bytes of characters can be written at the same time, it cannot be extracted. Whether it is true or not, this leads to serious problems. For example, you cannot write binary data to wofstream because this class attempts to narrow it before output.
This is obvious to me because I have a large number of functions to write like this:
void outputStuff( tostream& os ){ // output stuff to the stream os << ....}
If you pass a tstringstream object, there will be no problem (for example, it exports a wide character ), however, if you pass tofstream, it will produce weird results (because all content is narrow ).
Wide file I/O: Solution
Use the debugger to track STL in a single step. The result shows that wofstream calls the STD: codecvt object to narrow the output data before writing to the file. STD: The codecvt object is used to convert a string from one character set to another. C ++ is required as a standard: 1. Convert chars to Chars (for example, do nothing with effort), 2. Convert wchar_ts to chars. The other is the cause of so many headaches.
Solution: write a new class inherited from codecvt to convert wchar_ts to wchar_ts (nothing) and bind it to the wofstream object. When wofstream tries to convert the data it outputs, it will call our new codecvt object. In fact, it does nothing and writes the output data without changing.
Find some code written by P. J. plauger in Google groups (author of STL library in msvc ),
It is used to solve the compilation problem of stlport 4.5.3. This is the final version:
#include
// nb: MSVC6+Stlport can't handle "std::"// appearing in the NullCodecvtBase typedef.using std::codecvt ; typedef codecvt < wchar_t , char , mbstate_t > NullCodecvtBase ;class NullCodecvt : public NullCodecvtBase{public: typedef wchar_t _E ; typedef char _To ; typedef mbstate_t _St ; explicit NullCodecvt( size_t _R=0 ) : NullCodecvtBase(_R) { }protected: virtual result do_in( _St& _State , const _To* _F1 , const _To* _L1 , const _To*& _Mid1 , _E* F2 , _E* _L2 , _E*& _Mid2 ) const { return noconv ; } virtual result do_out( _St& _State , const _E* _F1 , const _E* _L1 , const _E*& _Mid1 , _To* F2, _E* _L2 , _To*& _Mid2 ) const { return noconv ; } virtual result do_unshift( _St& _State , _To* _F2 , _To* _L2 , _To*& _Mid2 ) const { return noconv ; } virtual int do_length( _St& _State , const _To* _F1 , const _To* _L1 , size_t _N2 ) const _THROW0() { return (_N2 < (size_t)(_L1 - _F1)) ? _N2 : _L1 - _F1 ; } virtual bool do_always_noconv() const _THROW0() { return true ; } virtual int do_max_length() const _THROW0() { return 2 ; } virtual int do_encoding() const _THROW0() { return 2 ; }} ;
For the full text, see:
Http://dozb.blogchina.com/1655050.html