QT Chinese garbled problem (more clearly, the same binary string is interpreted into different languages)

Source: Internet
Author: User

Article Source: http://blog.csdn.net/brave_heart_lxl/article/details/7186631

The following is dbzhang about QT Chinese garbled problem reason, feel good:

First of all, to declare that QString is not a Chinese support problem, many people encounter problems, not their own QString problems, but not the correct string of their own to assign to QString.


Very simple question,"I am Chinese" when writing, it is the traditional char type of narrow string, we need is just some way to tell Qstring that the four characters adopted the kind of encoding. And the problem is generally in many users of their current code is not too many concepts,

So

A simple Qt program

The following small procedure, it is estimated that people will feel more cordial. There seems to be quite a lot of Chinese users trying to write code like this:

#include <QtGui/QApplication>#include <QtGui/QLabel>int main (int argc, char **argv){     qapplication app (argc, argv); QString a= "I am a Chinese character"; Qlabel label (a); label.show (); return app.exec (); }

coding, saving, compiling, running, everything went well, but the result:

    • Most users see

      Other users See

      Îòê纺xö

      ƈ ' 是汉å-

Unexpectedly, the interface on the Chinese did not show up, there is no recognition of characters. So start searching with search engines, start posting on forums or complaining

Finally, I was told that one of the following statements could solve the problem:

Qtextcodec::setcodecforcstrings (Qtextcodec::codecforname ("GB2312")); Qtextcodec::setcodecforcstrings (Qtextcodec::codecforname ("UTF-8"));

Two instructions in one try, it can be solved (most users are the first, the other user is the second one). So, why is this so?

When two kinds of garbled characters appear

I think everyone may have something to say about this question. Before proceeding, let's make a list of the two garbled characters that appear in that case:

We only list the most commonly used 3 compilers (Microsoft vs in Cl, MinGW in the G++,linux under the g++), the source code uses GBK and without the BOM UTF-8 and The UTF-8 with a BOM is saved in the 3 encoding.

  • Encoding of the source code

    Compiler

    Results

    GBK

    Cl

    1

    *

    mingw-g++

    1

    *

    g++

    1

    UTF-8 (without BOM)

    Cl

    2

    mingw-g++

    2

    g++

    2

    *

    UTF-8 (with BOM)

    Cl

    1

    mingw-g++

    2

    g++

    Compilation failed

Using 3 different code to save the source code files, respectively, with 3 different compilers compiled, to form 9 combinations, to remove a situation that can not work, two garbled occurrences of the situation accounted for half.

From this we can also see that garbled and the operating system is not related to the original. But the general purpose of our gbk,linux in Windows is to use a UTF-8 without a BOM. If we only consider the case with * , we can also say that the two garbled and system-related.

Why is QString garbled?

Is it really QString garbled? We can ask ourselves, is the object of our complaint mistaken?

Before proceeding, define several concepts:

Clear Concept 0:
    • "I am Kanji" is a string in the C language, which is a narrow string of char type. The above example can be written as

const char * str = "I am a kanji"; QString a= str;

Or

Char str[] = "I am a Chinese character"; QString a= str;

such as

Clear Concept 1:
    • The source file is encoded, but the plain text file does not record the encoding it uses

This is the root of the problem, you may want to do a test, the previous source code is saved to GBK encoding, with the 16-input editor can see the quotation marks are CE d2 CA c7 ba ba d7 d6 such as 8 bytes.

Now copy the file to traditional (traditional Chinese) in Windows, what will it look like with Notepad open?

...     QString a= "Contact loam excised callus";     Qlabel label (a);     label.show ();  ...

So put it in the European and American Windows system, and then open it with Notepad?

...     QString a= "Îòê纺xö";     Qlabel label (a);     label.show ();  ...

The same file, without making any modifications, but one of the 8 bytes CE d2 CA c7 ba Ba d7 D6, for the mainlanders with GBK, with BIG5 of the people of Hong Kong and Macao, as well as with the Latin-1 of Europeans, see is completely different words.

Clear Concept 2:
    • As we all know, ' A ' is equivalent to ' \x41 '.

GBK encoded under the

const char * str = "I am a kanji"

Equivalent to

const char * str = "\xce\xd2\xca\xc7\xba\xba\xd7\xd6";

When encoded with UTF-8, it is equivalent to

const char * str = "\xe6\x88\x91\xe6\x98\xaf\xe6\xb1\x89\xe5\xad\x97";

Note: This statement is not all right, such as saving to a BOM with the UTF-8, the CL compiler, the Chinese character itself is UTF-8 code, but the program is stored in the corresponding GBK code.

Clear Concept 3:
    • Unicode is used internally in the QString.

Qstring internally uses Unicode, which can hold the characters "I am Kanji" in GBK, the characters in BIG5 "contact loam excised callus" and Latin-1 characters "Îòê纺xö".

One problem is that this 8-byte "\xce\xd2\xca\xc7\xba\xba\xd7\xd6"in the source code, how do I convert it to Unicode and coexist in QString? According to GBK, BIG5, Latin-1 or other ways ...

In case you don't tell it, it chooses Latin-1 by default, so the Unicode code of 8 characters "Îòê纺xö" is stored in the qstring. Finally, 8 Latin characters appear in the place where you expect to see the 4 character, so-called garbled

QString How to work

const char * str = "I am a kanji"; QString a= str;

In fact, a very simple question, when you need to convert from a narrow string char* to a Unicode qstring string, you need to tell qstring what exactly is encoded in this string of char*? GBK, BIG5, Latin-1

Ideally: When passing char* to qstring, tell qstring what his code is:

Like the following function, the qstring member function knows what encoding to handle the C string

QString qstring::fromascii (const char * str, int size =-1) QString qstring::fromlatin1 (const char * str, int size =- 1) QString qstring::fromlocal8bit (const char * str, int size =-1) QString Qstring::fromutf8 (const char * str, int si Ze =-1)

Single Qstring only provide these member functions, far enough to meet the needs of everyone, for example, in the Simplified Chinese windows, Local8bit is GBK, but there is a char string is BIG5 or Latin-2 what to do?

Then use the powerful Qtextcodec, first of all qtextcodec must know the code of their own, and then you send a char string to it, it will be able to correctly turn it into Unicode.

QString Qtextcodec::tounicode (const char * chars) const

But this call is too much trouble, I just want to directly

QString a= str;

Or

QString A (str);

What do you do with this?

So there's no way to tell QString what kind of code your STR is, only in other ways. This is the beginning of the

Qtextcodec::setcodecforcstrings (Qtextcodec::codecforname ("GBK")); Qtextcodec::setcodecforcstrings (Qtextcodec::codecforname ("UTF-8"));

Sets the encoding that Qstring uses by default. And which one, in general is the source code is GBK, with GBK, the source code is UTF-8 with UTF-8. With one exception, if you save the UTF-8 with a BOM and use Microsoft's CL compiler, this is still GBK.

In summary, the main reasons for garbled appearance are:

Qstring internally uses Unicode, which can hold the characters "I am Kanji" in GBK, the characters in BIG5 "contact loam excised callus" and Latin-1 characters "Îòê纺xö".

When you need to convert from a narrow string char* to a Unicode qstring string, you need to tell qstring what exactly is encoded in this string of char*? GBK, BIG5, Latin-1?

In case you don't tell it, it chooses Latin-1 by default, so the Unicode code of 8 characters "Îòê纺xö" is stored in the qstring. Finally, 8 Latin characters appear where you expect to see the 4 character,

The so-called garbled characters appeared.

Online there are many ways to introduce directly in the Main.cpp settings:

Qtextcodec *codec = Qtextcodec::codecforname ("UTF-8");

QTEXTCODEC::SETCODECFORTR (codec);

Qtextcodec::setcodecforlocale (codec);
Qtextcodec::setcodecforcstrings (codec);

In fact, this is problematic in some cases, because the program may read the Chinese path of the system, or call the external program under the Chinese path, if the system is gb2312 there is a problem.

Because the Chinese path encoding is the use of utf-8 storage in the qstring, the system read Chinese path decoding is the system gb2312, so will not be adjustable with the Chinese path of external programs.

The following methods can be used to solve the above problems:

Qtextcodec *codec = Qtextcodec::codecforname ("UTF-8");

QTEXTCODEC::SETCODECFORTR (codec);

Qtextcodec::setcodecforlocale (Qtextcodec::codecforlocale ());
Qtextcodec::setcodecforcstrings (Qtextcodec::codecforlocale ());

All local encodings are used for external string encoding.

Qt Chinese garbled problem (more clearly, the same binary string is interpreted as a different language)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.