MySQL Chinese garbled some of the solutions

Source: Internet
Author: User
Tags character set comparison range table name

If you also encounter this problem, let us not talk about the reason, in the PC with the cmd (or MySQL installation version of the installation of the command line client, or the work of the SECURECRT) to try the effect. Go to the MySQL environment and start from scratch. Suppose your client code is GBK or UTF8 (so not too rigorous, how can assume, but generally if not moved after installation, CMD is GBK code, MySQL installed command line client did not pretend not to remember, CRT see session Options inside the encoding settings, generally will be set to UTF8), to execute some statements:

1. Set the encoding client, connect, return the result of the character set, first set into Latin1,

2. Then perform the following look at the individual characters is not like this,

If your character_set_client, character_set_connection, character_set_results are not latin1, you can do so by setting them individually into latin1, For example, set Character_set_client, the other two, make sure that all three are latin1 (the first SQL statement actually does this),

3. The creation of a separate database db_latin1, of course, is very simple, testing, the creation of the database to set the encoding of the latin1,

4. Create a table tab_latin1 below it, the character set is also set to Latin1, the character is not set here, the database level has been set, here only a name field is created,

5. Insert some Chinese characters into the table, first of all, the machine's cmd code is GBK, the view method is the right key property-> option, look at the current code page can know,

6. View the results

See, the normal display of Chinese ~ ~ ~

OK, it's all here. You don't want to know, "Why do I have to do that?", of course, to look down is not. Image above:

We know that MySQL is a client-server software, each operation is a client to send a request to the service side, and then may return some results, the insertion between the characters through a series of transformations. The client that we edit first has an encoding, for example, the PC-side command line defaults to gbk,pc. New text file defaults to ANSI, commonly used text editors such as notepad++, we may set the default encoding for UTF8, which means editing on the editor, What you see in itself is an encoding.

1. After the client is edited, first translates into the corresponding character set, i.e. the character set of the character_set_client variable indicated above;

2. Send the request to the database service, send the process, convert to connection corresponding connection character set, namely character_set_connection variable corresponding character set;

3. Stored in a database into a database-stored character set, possibly at the server level (Character_set_server), at the database level (character_set_database), or at the table level and column level (see below);

4. The database receives the request, executes the query to obtain the result, transforms again to the results corresponding character set, namely Character_set_results variable refers, the result returns to the client;

5. As a result, it is encoded according to the results character set, so it is important that we let this result show the client tool what kind of coding it supports, which determines how it decodes the results. If this result is UTF8 code, return to a client, but this client only ANSI code, which of course can not display normal, such as it returned to SECURECRT, the result is not normal, but the CRT supports a number of encodings, we manually adjust it to UTF8 code, then it shows normal again, So strictly speaking this step is not, but with the client conditions, after all, when we know the client to adjust to normal coding or to support the conversion of results encoding, this step does not exist.

In the 3rd step above, from the connection character set code into the database storage used by the encoding, in several cases, in general when we install MySQL, especially when the 32-bit installation version, the middle has a choice of coding steps, most will choose UTF8 code, The system may then set a series of character set variables to UTF8, such as Character_set_server, Character_set_connection, Character_set_database, and so on. In other words, this character_set_server variable is set up in advance of your MySQL service, and we can call it a server-level code, so we have to create the database before we build it, we know we can explicitly specify the encoding when we create the database. For example, when I first created, I explicitly specified the use of the latin1 character set, or not, if not specified, it will take the server-level character set, that is, Character_set_server, in the same way when creating a table, you can not specify the encoding, do not specify, the use of database-level coding, Level character_set_database, more in the same way when you create a column field in a table, you can specify the encoding, and the table-level character set is used without encoding, so there is an inheritance relationship here:

Character_set_server => character_set_database => character set in table (without this variable) => character set column (no this variable)

MySQL creation table can be refined to these four levels, not each layer must be specified, the default use of the upper level of the character set (character proofing rules are the same, collation, later).

So is it possible that Character_set_server did not specify, if not specified anywhere, especially in the non-installation version, if forgotten, MySQL at compile time by default latin1, in order to deal with this situation, especially in the non-installation version in the configuration of MySQL, You often need to manually configure the MySQL profile Mysql.ini, which is probably one of these:

The character set that is used by default in the configuration file, so that if Character_set_server is specified by default, it is adopted so that other levels are not specified in succession.

Other, Character_set_filesystem: Converts the characters on the operating system to this character set, that is, convert the character_set_client to Character_set_filesystem, and the default is binary without conversion. Character_set_system: This variable is always utf8, for the character set of the storage system metacharacters, such as table name, column name, user name, and so on, Character_set_dir: Obviously a variable indicating a directory, open this directory, Inside is a variety of MySQL's XML format files for encoding character sets. The above three values can be ignored when solving garbled problems.

Well, the conversion process and the meaning of the variables are clear, it is necessary to figure out which character set code can be converted between, can be converted in a certain range of code can also be converted characters, not garbled or even damaged. Damage will no longer be able to correctly display, even if the settings are correct, restore is not back. Of course, about the conversion between characters a lot of character sets there are so many, any two can be converted between a try, can not be enumerated, you may refer to this article: http://www.imcjd.com/?p=1324, it is often used in the conversion of the character to make some transformation comparison and testing.

In which, you can see that the exact matching conversion is definitely no problem, such as,gbk->gbk,utf8->utf8,latin1->latin1; conversion to single-byte encoding latin1 also no problem, such as Gbk->latin1, utf8->latin1; single byte Encoding (LATIN1) to other in some areas of the encoding may be the conversion of the incomplete, such as LATIN1->GBK (very special Chinese), or encoding length changes, such as Latin1->utf8, to become 2, 3 byte count.

The following refers to a paragraph in another article (HTTP://HI.BAIDU.COM/CUTTINGER/ITEM/F4E79726A60AB450C28D59DA).

"Latin1 is a very common character set, which is a single-byte encoding, backward-compatible ASCII, whose encoding range is 0x00-0xff,0x00-0x7f between full and ASCII, and 0x80-0x9f is a control character, Between 0xa0-0xff is a literal symbol. Obviously, Latin1 covers all the single-byte, so you can save any string in the latin1 character set without worrying that the content does not conform to the Latin1 coding specification. --GBK and UTF8 are multi-byte encodings and do not have this feature.

MySQL users often take advantage of this full coverage feature of Latin1 to save other types of strings, Gbk,utf8,big5, and so on, in latin1 columns. There is no data loss in the save process, as long as it is taken out, it is a valid Gbk/utf8/big string. If you save the GBK string in the UTF8 column, the content in the GBK string that does not conform to the UTF8 encoded format is discarded, the saved content cannot be removed as it is, and the data is actually corrupted.

In summary, if we see that the character set of a field is latin1, then he may save any encoded string, and the character set of a field is UTF8 or GBK, then he should save UTF8 or GBK-unless the user of the database is wrong. 】

I did not delve into the details of UTF8, GBK coding, most likely inaccurate, only a simple ASCII encoding (-_-), but I can understand the global situation. From the above, Latin1 's single-byte encoding is useful, and other encodings can be converted to it to turn back without losing content. The so-called Single-byte code is next to each, I understand is that, for example, Christmas arrived, you want to send her sister a box of apples, for the creation of romance, shops to provide two kinds of packaging, one by number, that is, a single Apple packaging into a box, to a packaging one, so, After all the boxes have been torn down, the sister can be restored to a whole and a box of intact apples, the second is by weight, each box weight 22, 32, 62, so in the packaging, if the weight of 32 of course can be fully put into a box, but if not enough or more, not Mian to cut the apple, Or add another part of the Apple to the box, so that no matter how the girl opened the box, she will get a box of broken apples, because you are in accordance with this way of packaging, has destroyed the integrity of a single apple, now restore does not come back ~ Our character set coding conversion is doing this repackaging work, latin1 just like a single apple wrapper, and UTF8 is like the second way.

And the exact match is that you go to buy a box of apples, all the apples in the box are exactly 22, or 32 or 62, so the weight of the package will be exactly allocated, and still the complete apple.

So plainly, two possible ways are:

1. All variables are set to Latin1 (set names latin1), so that even if we use a variety of editing client coding (GBK or UTF8), the correct results can be obtained;

2. All settings into GBK or gb2312 (GB code, only for Simplified Chinese), using a full match;

3. For the intermediate conversion process, such as GBK input, the character_set_client, character_set_connection as the latin1,character_set_database set as gb2312, When the table is set up for Gb2312,character_set_results can also be defined as gb2312, of course, this is only chicken, in essence, or LATIN1,GBK to Latin1 and then turn gb2312 only for the simplified.

Finally, about character set proofing rules, just a little. In our MySQL character set, MySQL will automatically give a corresponding proofing rules, such as setting CharSet for UTF8, the default collation is the utf8_general_ci,gb2312 character set corresponding Gb2312_chinese_ci, MySQL command to view all proofing rules is show collation, to see a corresponding character set of the proofing code is collation like ' utf8% '.

Character Set proofreading is a sort of using the current character set, contrast, even if the same character set, in different regions are different ways of comparison, so there is proofreading such a word, such as utf8_general_ci, this CI is case insensitive, That is, the case is not sensitive to use it when proofreading, query a field value matching, the case records will appear, of course, there are other rules, utf8 print out a big tuo, not fine study ~

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.