QT Solve Chinese garbled problem (2) _

QT Solve Chinese garbled problem (2) __ garbled problem

Last Update:2018-08-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First of all, declare that QString is not a Chinese support problem, many people encounter problems, not their own QString problem, but not the string they want to correctly assign to QString.

The simple question, "I'm Chinese", is that it's a narrow string of traditional char types, and all we need is just one way of telling qstring the code used in these four characters. And the problem generally is that many users don't have much concept of their current coding,

So
a simple Qt program

The following small program, it is estimated that everyone will feel more cordial. It seems that a considerable number of Chinese users have tried to write this code:

#include <QtGui/QApplication>
#include <QtGui/QLabel>

int main (int argc, char **argv)
{
    Qapplication app (argc, argv);
    QString a= "I am a Chinese character";
    Qlabel label (a);
    Label.show ();
    return app.exec ();
}

Code, save, compile, run, everything goes well, but the result:

Most users see	Other users See
Îòêçººxö	Æˆ ' æ˜¯æ±‰å-

Unexpectedly, the interface of Chinese did not show up, there is no recognition of characters. So began to use search engine search, start on the forum post or complain

Finally, I was told that one of the following statements could solve the problem:

Qtextcodec::setcodecforcstrings (Qtextcodec::codecforname ("GB2312"));
Qtextcodec::setcodecforcstrings (Qtextcodec::codecforname ("UTF-8"));

Two instructions to try each one, it can be resolved (most users are the first, the other user is the second). So, why is that so? two kinds of garbled when appear

I think we may all have something to say about this question. Before continuing, let's make a list of the two garbled characters that appear in that case:

We only list the 3 compilers most commonly used (Microsoft vs Cl, MinGW in the G++,linux under the g++), the source code used GBK and without BOM UTF-8 as well as UTF-8 with BOM are saved in this 3 encoding.

Encoding of source code	Compiler	Results
GBK	Cl	1	*
mingw-g++	1	*
g++	1
UTF-8 (with no BOM)	Cl	2
mingw-g++	2
g++	2	*
UTF-8 (with BOM)	Cl	1
mingw-g++	2
g++	Compilation failed

The use of 3 different code to save source code files, respectively, with 3 different compilers compiled to form 9 combinations, to remove a situation that can not work, two garbled appearance each accounted for half.

From this we can also see that garbled and operating system is not the original relationship. But the general use of Gbk,linux in Windows is UTF-8 with no BOM. If we only consider the case with * , we can also say that two kinds of garbled and system-related. Why is QString garbled?

Really is QString garbled. We can ask ourselves whether we are complaining about the wrong person.

Before you go on, define a few concepts: Clear Concept 0: "I am a Chinese character" is a string in C, which is a narrow string of char type. The above example can be written as

const char * str = "I am Chinese";
QString a= str;

Char str[] = "I am a Chinese character";
QString a= str;

And so clear concept 1: The source file is encoded, but this plain text file does not record the encoding that you use

This is the root of the problem, may wish to do a test, the previous source code to save the GBK code, with the 16 editor can see the quotation mark is CE d2 CA C7 ba ba d7 d6 such 8 bytes.

Now copy the file to Roman (traditional Chinese) in Windows, and open it in Notepad.

...
    QString a= "扂 岆 犖 趼";
    Qlabel label (a);
    Label.show ();
...

Then put it in the Windows system of European and American people, then open it with Notepad.

...
    QString a= "Îòêçººxö";
    Qlabel label (a);
    Label.show ();
...

The same file, did not make any changes, but one of the 8 bytes CE d2 CA C7 ba D7 D6, for the use of the gbk of the mainland people, with the BIG5 of Hong Kong and Macao compatriots, as well as the Latin-1 of Europeans, see is completely different words. Clear Concept 2: as we all know of ' A ' and ' \x41 ' equivalence.

GBK encoded under the

const char * str = "I am Chinese"

Equivalent to

const char * str = "\xce\xd2\xca\xc7\xba\xba\xd7\xd6";

When encoded with UTF-8, it is equivalent to

const char * str = "\xe6\x88\x91\xe6\x98\xaf\xe6\xb1\x89\xe5\xad\x97";

Note: This statement is not all right, such as UTF-8 with the BOM, with the CL compiler, the Chinese character itself is UTF-8 encoding, but the program is stored in the corresponding GBK code. Clear Concept 3: Unicode is used internally in QString.

qstring internal Unicode, it can store the characters in GBK "I am Chinese", BIG5 characters "扂岆犖趼" and Latin-1 characters "Îòêçººxö."

One problem is that the 8-byte "\xce\xd2\xca\xc7\xba\xba\xd7\xd6" in the source code, how to convert to Unicode and coexist in the QString. According to GBK, BIG5, Latin-1 or other ways ...

In case you don't tell it, it chooses Latin-1 by default, so the 8-character "Îòêçººxö" Unicode code is stored in the qstring. In the end, 8 Latin characters appear in the place where you expect to see 4 Chinese characters, the so-called garbled QString working way

const char * str = "I am Chinese";
QString a= str;

In fact, a very simple question, when you need to convert from a narrow string char* to a Unicode qstring string, you need to tell qstring what your code is in this string of char*. GBK, BIG5, Latin-1

Ideally: When you pass char* to qstring, tell qstring what your code is:

Like the following function, Qstring's member functions know what encoding to use to handle C strings

QString qstring::fromascii (const char * str, int size =-1)
QString qstring::fromlatin1 (const char * str, int siz  E =-1) 
QString qstring::fromlocal8bit (const char * str, int size =-1)
QString Qstring::fromutf8 (const char * str, int size =-1)

Single Qstring only provide these member functions, far from meeting the needs of everyone, for example, in the Simplified Chinese windows, Local8bit is GBK, but there is a char string is BIG5 or Latin-2 how to do.

Then use the powerful Qtextcodec, first qtextcodec must know the code of their own responsibility, and then you send a char string to it, it will be able to correctly convert it into Unicode.

QString Qtextcodec::tounicode (const char * chars) const

But this call is too much trouble, I just want to direct

QString a= str;

QString A (str);

What to do with this.

There's no way to tell QString what your str code is, but in other ways. That's the beginning.

Qtextcodec::setcodecforcstrings (Qtextcodec::codecforname ("GBK"));
Qtextcodec::setcodecforcstrings (Qtextcodec::codecforname ("UTF-8"));

Sets the encoding to be used by default qstring. and exactly which one, in general, is the source code is GBK, with GBK, the source code is UTF-8 on the UTF-8. But with one exception, if you save it as a UTF-8 with a BOM and use Microsoft's CL compiler, this is still GBK.

In Summary, the main reasons for the garbled characters are:

qstring internal Unicode, it can store the characters in GBK "I am Chinese", BIG5 characters "扂岆犖趼" and Latin-1 characters "Îòêçººxö."

When you need to convert from a narrow string char* to a Unicode qstring string, you need to tell qstring what your code is in this string of char*. GBK, BIG5, Latin-1.

In case you don't tell it, it chooses Latin-1 by default, so the 8-character "Îòêçººxö" Unicode code is stored in the qstring. Eventually, 8 Latin characters appear where you expect to see 4 Chinese characters,

The so-called garbled appeared.

There are many methods on the web that are set directly in Main.cpp:

Qtextcodec *codec = Qtextcodec::codecforname ("UTF-8");

QTEXTCODEC::SETCODECFORTR (codec);

Qtextcodec::setcodecforlocale (codec);
Qtextcodec::setcodecforcstrings (codec);

In fact, this is also problematic in some cases, because the program may read the system's Chinese path, or call the Chinese path of the external program, at this time if the system is gb2312 has a problem.

Because the Chinese path encoding is to use Utf-8 to qstring, the system reads the Chinese path decoding time to use is the system gb2312, therefore will not be able to adjust with the Chinese path the external program.

The above problems can be solved by the following methods:

Qtextcodec *codec = Qtextcodec::codecforname ("UTF-8");

QTEXTCODEC::SETCODECFORTR (codec);

Qtextcodec::setcodecforlocale (Qtextcodec::codecforlocale ());
Qtextcodec::setcodecforcstrings (Qtextcodec::codecforlocale ());

Local encoding is used for external string encoding decoding.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

QT Solve Chinese garbled problem (2) __ garbled problem

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

QT Solve Chinese garbled problem (2) __ garbled problem

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support