Wxwidgets Source Analysis (9)-wxstring

Source: Internet
Author: User
Tags wxwidgets

Directory

    • Wxstring
      • Wxstring support for Chinese characters
      • Conversion of wxstring to universal strings
      • Character Set conversions
Wxstringwxstring support for Chinese characters

The encoding format for Chinese characters is as follows:

Kanji GBK Location Code UTF-8 UTF-16
In D6 D0 54 48 E4 B8 AD 4E 2D
Text CE C4 46 36 E6 96 87 65 87

Default internal codes for different operating systems

Windows系统(默认GBK):41  42  d6  d0  ce  c4Linux系统(默认UTF-8):41  42  e4  b8  ad  e6  96  87
Windows

The WXSTRUBG uses the UTF-16 encoding format in the window System. However, in the Windows system, the default file encoding format is GBK format, considering this, wxstring in the assignment value, the character encoding from GBK to UTF-16 encoding conversion, which requires the source code must be in GBK format!! Otherwise the display is garbled, of course, there is another way, is to use the Wxconv class to tell wxstring the current character encoding is what format.

Source file format for GBK test, you can find that only do not convert to display.

The source file format is UTF-8 when the test, you can find only the conversion can be displayed, the other shows garbled.

Linux Unicode

When compiling wxwidgets 3.0, the "--enable-unicode" option was specified, and after this option was specified, the internal encoding of wxstring was encoded using the UTF-16 format (note: Even if the--enable-unicode option is not specified, Wxwidgets still uses Unicode format to save data, using UTF-32 in Utf-16,linux&mac system in Windows system)

测试函数:wxString("AB中文",wxConvUTF8)输出:41 42 4E2D 6587 测试函数:wxString("AB中文")输出:空测试函数:_("AB中文")输出:空(源码文件格式为UTF-8,如果源文件为非UTF-8格式,则都输出错误)

As can be seen, the file format of the Linux system is UTF-8 format, after conversion, wxstring internal is UTF-32 format, if you want to display Chinese in Linux, you need to use WxConvUTF8 to convert .

Linux UTF-8

The "--enable-utf8--enable-utf8only" option was specified when compiling wxwidgets 3.0, so that the UTF-8 format is used directly when wxstring internal encoding, and the--enable-utf8only option is added. The conversion between characters is forbidden, the efficiency is greatly improved, and the code is much easier to write:

测试函数:wxString("AB中文",wxConvUTF8)输出:41 42 4E2D 6587 测试函数:wxString("AB中文")输出:41 42 4E2D 6587 测试函数:_("AB中文")输出:41 42 4E2D 6587 (源码文件格式为UTF-8,如果源文件为非UTF-8格式,则都输出错误)

Note that the data in the above three formats can be output normally.

Summarize

Recommended follow-up in the writing wxwidgets program, the interface involved in the language is all the use of English, so whether the file is GBK encoding, or UTF-8 code, can compile, display normal.

If you need to support more than a language, we recommend using the Wxlocale solution provided by wxwidgets.

Wxstring and universal string conversion wxstring object creation

In addition to the Wxstring constructor, you can generate the Wxstring object in several ways:

Static Methods Description
Wxstring::fromascii () A Wxstring object is created with an ASCII string, and if the string contains a string greater than or equal to 0x80, Wxstring will error, such as
wxString str = wxString::FromAscii("ABC中文");
If the debug mode is turned on at run time, the alert message will be received, and the string displayed if it is garbled;
Wxstring::fromutf8 () Create objects by UTF-8 strings, such as:
wxString test = wxString::FromUTF8("\xF0\x90\x8C\x80");
Constructor +wxmbconv Object wxMBConvThe object default value iswxConvLibc
To convert a Wxstring object to another type of data

Wxstring provides a variety of ways to convert data to the data you need;

C_str ()

Prototype:

wxCStrData wxString::c_str() const;

This function is used to convert the Wxstring object to const char* or const wchar_t* format the data, and the following is how to use it:

const char *p = str.c_str();        // 左侧已经注明类型,c_str函数将会输出`const char*`类型sprintf(abStr, "sprintf = %s\n", (const char *)str.c_str());    // 必须加强转

The trace code can see the wxCStrData type auto-conversions that are implemented by overloading the strong-turn operators:

    inline const wchar_t* AsWChar() const;    inline const char* AsChar() const;    const unsigned char* AsUnsignedChar() const        { return (const unsigned char *) AsChar(); }            // 重载转换操作符    operator const wchar_t*() const { return AsWChar(); }    operator const char*() const { return AsChar(); }    operator const unsigned char*() const { return AsUnsignedChar(); }    operator const void*() const { return AsChar(); }

The output of the data is passed through a wxCStrData::AsChar() function, and the internal implementation is actually implemented by invoking Wxstring's intrinsic function AsChar mb_str :

inline const char* wxCStrData::AsChar() const{#if wxUSE_UNICODE && !wxUSE_UTF8_LOCALE_ONLY    const char * const p = m_str->AsChar(wxConvLibc);    if ( !p )        return "";#else // !wxUSE_UNICODE || wxUSE_UTF8_LOCALE_ONLY    const char * const p = m_str->mb_str();#endif // wxUSE_UNICODE && !wxUSE_UTF8_LOCALE_ONLY    return p + m_offset;}
Mb_str ()

Prototype:

const wxCharBuffer  mb_str (const wxMBConv &conv=wxConvLibc) const;

The use method is similar to C_STR:

const char *p = str.c_str();

The function is implemented as follows, mb_str called AsCharBuf , followed by the AsCharBuf AsChar implementation transformation (the AsChar function is followed by a detailed analysis):

const wxScopedCharBuffer mb_str(const wxMBConv& conv = wxConvLibc) const{    return AsCharBuf(conv);}wxScopedCharBuffer AsCharBuf(const wxMBConv& conv) const{    ...    if ( !AsChar(conv) )    ...}

We look at wxScopedCharBuffer the implementation again, it should also implement the wxCStrData function of the operator cast, tracking code:

typedef wxScopedCharTypeBuffer<char> wxScopedCharBuffer;// 继续 wxScopedCharTypeBuffer,可以看到它也实现了操作符重载,转换为模板指定的类型// 对于 wxScopedCharBuffer 来说就是 const char*template <typename T>class wxScopedCharTypeBuffer{public:    typedef T CharType;        const CharType *data() const { return  m_data->Get(); }    operator const CharType *() const { return data(); }}
Utf8_str ()

The function returns the UTF-8 string and can see that it mb_str(wxMBConvUTF8()) implements its own conversion by calling, and the output string format is UTF-8 encoded.

const wxScopedCharBuffer utf8_str() const { return mb_str(wxMBConvUTF8()); }
Tostdstring ()

Used to convert a Wxstring object to a Std::string object, as you can see, ultimately, by calling a mb_str function.

std::string ToStdString() const{    wxScopedCharBuffer buf(mb_str());    return std::string(buf.data(), buf.length());}
Key functions of Wxstring::aschar conversion

The following code analysis for Unicode mode:

  const char *wxstring::aschar (const wxmbconv& conv) const{#if Wxuse_unicode_utf8 ... #else//Wxuse_unicod E_wchar//Call C_STR () to get wchar_t*, because wxstring internal is using wchar_t* storage//So this pace with direct access to the internal buffer const wchar_t * Const STRWC =    M_impl.c_str (); Const size_t LENWC = M_impl.length (); #endif//Wxuse_unicode_utf8/wxuse_unicode_wchar//Call ' conv. Fromwchar ' converts Unicode characters to the multibyte string of the day system, const size_t LENMB = conv.    Fromwchar (NULL, 0, STRWC, LENWC);    if (LENMB = = wxconv_failed) return NULL; if (!m_convertedtochar.m_str | | LENMB! = m_convertedtochar.m_len) {if (!const_cast<wxstring *> (this)-&    Gt;m_convertedtochar.extend (LENMB)) return NULL;    } M_CONVERTEDTOCHAR.M_STR[LENMB] = ' + '; if (conv.    Fromwchar (M_convertedtochar.m_str, LENMB, STRWC, LENWC) = = wxconv_failed) return NULL; return m_convertedtochar.m_str;}  

Call the implementation of wxMBConv::FromWChar string conversion, tracing, wxMBConv::FromWChar it will continue to invoke the wxMBConv::WC2MB execution of the transformation, can refer to another article, there is a description of the implementation of this function.

The conversion of wxstring to local string is realized through the above procedure.

Character Set conversions

The formal code is called in the following Way (wxWidgets-3.0.2):

wxString    testStr("abc中文测试");

Then we look at how the Wxstring is implemented internally:

Call the constructor of Wxstring:

// string.h L1241  wxString(const char *psz)    : m_impl(ImplStr(psz)) {}

The constructor of Wxstring calls Implstr to initialize the wxstring internal variable M_impl, which is defined as follows:

  static wxScopedWCharBuffer ImplStr(const char* str,                                     const wxMBConv& conv = wxConvLibc)    { return ConvertStr(str, npos, conv).data; }

Let's take a look at the origin of the second parameter wxconvlibc, which we'll say later:

// strconv.h L563#define WX_DECLARE_GLOBAL_CONV(klass, name)                 extern WXDLLIMPEXP_DATA_BASE(klass*) name##Ptr;         extern WXDLLIMPEXP_BASE klass* wxGet_##name##Ptr();     inline klass& wxGet_##name()                            {                                                           if ( !name##Ptr )                                           name##Ptr = wxGet_##name##Ptr();                    return *name##Ptr;                                  }    // 使用WX_DECLARE_GLOBAL_CONV宏预定义转换对象,实际是声明了一个函数// 和一个指针,实现wxGet_wxConvLibc函数返回这个指针,保证全局唯一WX_DECLARE_GLOBAL_CONV(wxMBConv, wxConvLibc)#define wxConvLibc wxGet_wxConvLibc()

Then IMPLSTR calls Convertstr to convert, and the function is implemented as follows:

// string.cpp L385#if wxUSE_UNICODE_WCHARwxString::SubstrBufFromMB wxString::ConvertStr(const char *psz, size_t nLength, const wxMBConv& conv){    // 调用 conv.cMB2WC 进行转换    wxScopedWCharBuffer wcBuf(conv.cMB2WC(psz, nLength, &wcLen));}

Continue to call WXMBCONV::CMB2WC for character data conversion, we trace the source code, this function calls Wxmbconv::towchar for data conversion, and then returns the length:

const wxWCharBufferwxMBConv::cMB2WC(const char *inBuff, size_t inLen, size_t *outLen) const{    const size_t dstLen = ToWChar(NULL, 0, inBuff, inLen);    //...

See the source of the Wxmbconv::towchar can be seen, it continues to call the MB2WC method for character conversion, this is a virtual interface, and then we look at the implementation of this virtual interface.

Next Wxconvlib, let's see how this is achieved.

Strconv.cpp L3417#define Wx_define_global_conv2 (Klass, Impl_klass, name, Ctor_args) wxdllimpexp_data_base (Klas                         s*) name# #Ptr = NULL;                                                                           Wxdllimpexp_base klass* wxget_# #name # #Ptr () {                                  Static Impl_klass name# #Obj Ctor_args;                                                  return &name# #Obj; }/* This ensures, all global converter objects                                               is created * */* By the time static initialization was done, i.e. before any */* thread is launched: */ Static klass* gs_# #name # #instance = wxget_# #name # #Ptr ()//Strconv.cpp l3437#if def __windows__ Wx_define_global_conv2 (Wxmbconv, Wxmbconv_win32, WXCONVLIBC, wxempty_parameter_value); #elif 0//Defin Ed (__wxosx__) Wx_define_global_conv2 (Wxmbconv, WxMBCONV_CF, WXCONVLIBC, (Wxfontencoding_utf8)); #else wx_define_global_conv2 (Wxmbconv, WXMBCONVLIBC, WXCONVLIBC, Wxempty_parameter_value); #endif

When the above macro is expanded, you can get the implementation class on the Windows platform, which is wxConvLibc wxMBConv_win32 implemented on other platforms wxMBConvLibc .

For the implementation in wxMBConv_win32 the Windows platform can refer to the implementation of Strconv.cpp, the main implementation of the MB2WC and WC2MB two interfaces, the functional description can refer to the Wxmbconv class interface definition:

    1. MB2WC: For the conversion from multibyte encoding to Unicode, i.e. GBK to UTF-16, the implementation is to invoke the Win32API MultiByteToWideChar function to perform the conversion of the character set
    2. WC2MB: In contrast to the previous conversion to implement Unicode to multibyte encoding, the specific implementation is to invoke the WideCharToMultiByte function to perform the conversion of the character set.

For UNIX platforms, the actual invocation mbstowcs() and wcstombs() implementation of the conversion, these two functions are standard C library functions, the implementation of these two functions and the current system locale is closely related, the manual is clearly stated that the behavior of the two functions depends on the LC_CTYPE value, as for the specific implementation, You can refer to the locale-related documentation.

Wxwidgets Source Analysis (9)-wxstring

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.