[Translation] how to upgrade STL-based applications to support Unicode

Source: Internet
Author: User

Source: http://dozb.blogchina.com/1655050.html

Translated by: dozb, Nicole

Original Author:Taka Muraoka

Original Source: http://www.codeproject.com/vcpp/stl/upgradingstlappstounicode.asp

Introduction

I recently upgraded a large program to replace the single-byte character with Unicode. Except for a few legacy modules, I faithfully use t-functions and use _ T () macros to wrap my strings and character constants. As we all know, this can be safely converted to Unicode, what I want to do is define Unicode and _ Unicode, and I pray that everything will work as I wish.

Oh, my God, how wrong I am :((

Therefore, I wrote this article to treat the pain of two weeks of work and to relieve others, which I have already suffered. Alas...

Basic

Theoretically, writing code that can be compiled using single-or double-byte characters is straightforward. I used to write a section here, but Chris Maunder has already written done it. The technology he described is widely known, so it is very helpful to understand the content of this article.

Wide file I/O

Here is the wide version of the stream class, which easily defines T-style macros to manage them:

You will use them like this:

tofstream testFile( "test.txt" ) ; testFile << _T("ABC") ;

The expected result is that when the single-byte character is used for compilation, the code will generate a 3-byte file. When the double-byte character is used for compilation, the code will generate a 6-byte file. But you are wrong. All files are 3 bytes.
What's wrong?

This is due to the Standard C ++ specification. When writing a wide streamFile. You must convert double-byte to single-byte. In the preceding example, the wide string l "ABC" (with 6 bytes in length) is converted into a narrow string (3 bytes) before being written to the file ). In worse cases, how to convert is determined by the implementation of the database (Implementation-dependent ).

I can't find a definite explanation of why things are like this. I guess the file is defined as a single-byte stream. If two bytes of characters can be written at the same time, it cannot be extracted. Whether it is true or not, this leads to serious problems. For example, you cannot write binary data to wofstream because this class attempts to narrow it before output.

This is obvious to me because I have a large number of functions to write like this:

void outputStuff( tostream& os ){    // output stuff to the stream    os << ....}

If you pass a tstringstream object, there will be no problem (for example, it exports a wide character ), however, if you pass tofstream, it will produce weird results (because all content is narrow ).

Wide file I/O: Solution

Use the debugger to track STL in a single step. The result shows that wofstream calls the STD: codecvt object to narrow the output data before writing to the file. STD: The codecvt object is used to convert a string from one character set to another. C ++ is required as a standard: 1. Convert chars to Chars (for example, do nothing with effort), 2. Convert wchar_ts to chars. The other is the cause of so many headaches.

Solution: write a new class inherited from codecvt to convert wchar_ts to wchar_ts (nothing) and bind it to the wofstream object. When wofstream tries to convert the data it outputs, it will call our new codecvt object. In fact, it does nothing and writes the output data without changing.

Find some code written by P. J. plauger in Google groups (author of STL library in msvc ),
It is used to solve the compilation problem of stlport 4.5.3. This is the final version:

#include 
 // nb: MSVC6+Stlport can't handle "std::"// appearing in the NullCodecvtBase typedef.using std::codecvt ; typedef codecvt < wchar_t , char , mbstate_t > NullCodecvtBase ;class NullCodecvt    : public NullCodecvtBase{public:    typedef wchar_t _E ;    typedef char _To ;    typedef mbstate_t _St ;    explicit NullCodecvt( size_t _R=0 ) : NullCodecvtBase(_R) { }protected:    virtual result do_in( _St& _State ,                   const _To* _F1 , const _To* _L1 , const _To*& _Mid1 ,                   _E* F2 , _E* _L2 , _E*& _Mid2                   ) const    {        return noconv ;    }    virtual result do_out( _St& _State ,                   const _E* _F1 , const _E* _L1 , const _E*& _Mid1 ,                   _To* F2, _E* _L2 , _To*& _Mid2                   ) const    {        return noconv ;    }    virtual result do_unshift( _St& _State ,             _To* _F2 , _To* _L2 , _To*& _Mid2 ) const    {        return noconv ;     }    virtual int do_length( _St& _State , const _To* _F1 ,            const _To* _L1 , size_t _N2 ) const _THROW0()    {        return (_N2 < (size_t)(_L1 - _F1)) ? _N2 : _L1 - _F1 ;    }    virtual bool do_always_noconv() const _THROW0()    {        return true ;    }    virtual int do_max_length() const _THROW0()    {        return 2 ;    }    virtual int do_encoding() const _THROW0()    {        return 2 ;    }} ;

For the full text, see:

Http://dozb.blogchina.com/1655050.html

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.