A little understanding of MySQL Chinese garbled

Last Update:2015-01-13 Source: Internet

Author: User

Tags mysql create mysql create table

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

When we tinker with MySQL, we will inevitably encounter this problem: insert Chinese characters garbled, although this is the operation of the first to match the environment, but in their own machine play on the time, must know a one or two bar, otherwise how elegant bragging B.

If you also encounter this problem, let's not talk about the reason, in the PC's own cmd (or MySQL installation after the installation of the command line client, or the work of the SECURECRT) try the effect. Go to the MySQL environment and start from scratch. Suppose your client code is GBK or UTF8 (so too not rigorous, how can assume, but generally speaking if not moved after installation, CMD is GBK code, MySQL installed command line client did not pretend not to remember, CRT see session The options inside the encoding settings, usually also set to UTF8), to execute some statements:

1. Set the encoding client, connect, return the result of the character set, first set to Latin1,

2. Then perform the following look at the individual characters is not the case,

If your character_set_client, character_set_connection, character_set_results are not latin1, you can do this and set them individually to latin1, such as setting Character_set_client, like the other two, make sure that all three are latin1 (the first SQL statement actually does this),

3. Create a separate database db_latin1, of course, is very simple, testing, the creation of the database is set to encode the Latin1,

4. Create a table below it tab_latin1, the character set also sets to Latin1, here does not set the character also line, the database level has already set, here only creates one name field,

5. Insert some Chinese characters into the table, first of all, the native CMD encoding is GBK, the view method is the right-click Property----the option to see the current code page to know,

6. View the results

See, the normal display of Chinese ~ ~ ~

OK, it's all here. You don't want to know, "Why do I set that up?" and of course I have to look down. ：

We know that MySQL is the client-server software, each time the client sends a request to the service, and then may return some results, the characters inserted between them undergo a series of transformations. First for us to edit the client itself has a code, such as the PC side of the command line default is gbk,pc notepad new text file by default is ANSI, commonly used text editor such as notepad++, we may set the default encoding for UTF8, that is, edit on the editor, What you see is a code in itself.

1. After the client is edited, it is first converted to a client-corresponding character set, i.e. the character set indicated by the character_set_client variable printed above;

2. Send the request to the database service and convert it to the connection corresponding connection character set, i.e. the character_set_connection variable corresponding character set;

3. Stored in the database, converted to the database storage character set, may be the server level (Character_set_server), database level (character_set_database) or table level and column level (in this context);

4. The database receives the request, executes the query to obtain the result, transforms again to the results corresponding character set, namely the character_set_results variable refers, this result returns to the client;

5. As a result, it is encoded in accordance with the results character set, and it is important that we let the client-side tool that this result shows what encoding it supports, which determines how it decodes the results. If the result is UTF8 code, returned to a client, but this client is only ANSI code, it certainly can not show normal, such as it returned to SECURECRT, the result is not normal, but the CRT supports a variety of encodings, we manually tune it to UTF8 encoding, then it shows normal, So strictly speaking this step is not, just with the client conditions, after all, when we know after the client adjusted to normal code or originally support the conversion of results encoding, this step does not exist.

In the 3rd step above, from the connection character set encoding to the use of database storage encoding, there are several cases, generally when we install MySQL, especially when the 32-bit installation version, there is a choice between the encoding step, most will choose UTF8 encoding, Then the system may set a series of character set variables to be UTF8, such as Character_set_server, Character_set_connection, Character_set_database, and so on. That is, the Character_set_server variable is set up in advance of your MySQL service, and we can call it server-level encoding, so we have to create the database before we build the table, and when we create the database we know we can explicitly specify the encoding, For example, at the beginning of my creation when I explicitly specify the latin1 character set, or can not be specified, if not specified, it will take the server-level character set, that is, Character_set_server, the same time when creating a table, can also not specify the encoding, if not specified, the use of database-level encoding, level Character_set_database, more so. You can also specify the encoding when you create a column field in a table, and the table-level character set is used when you do not specify the encoding, so there is an inheritance relationship here:

Character_set_server = Character_set_database = Character set in table (without this variable) = Character Set column (without this variable)

MySQL CREATE table can be refined to these four levels, not each layer must be specified, the default use of the previous level of the character set (character proofing rules are the same, collation, explained later).

Then there is no possibility character_set_server not specified, if not specified anywhere, especially in the non-installation version, if you forget, MySQL at compile time by default latin1, in order to deal with this situation, especially in the non-installed version of the configuration of MySQL, It is often necessary to manually configure the MySQL configuration file Mysql.ini, which is about one of the following:

The character set is adopted by default in the configuration file, so if Character_set_server is specified, it will be adopted by default so that no other hierarchy is specified and then inherited.

Other, Character_set_filesystem: convert the characters on the operating system to this character set, that is, convert character_set_client to Character_set_filesystem, the default is binary does not convert, Character_set_system: This variable is always UTF8, the character set for storing system metacharacters, such as table name, column name, user name, and so on, Character_set_dir: It is obviously a variable that indicates a directory, open this directory, inside The various XML format files used to encode the character set are placed in MySQL. The above three values can be ignored when solving garbled problems.

OK, the conversion process and the meaning of each variable is clear, it is necessary to figure out which character set encoding can be converted, can be converted in a certain encoding range of characters can be converted, not garbled or even corrupt. Damage will never be displayed correctly, even if the settings are correct, restore is not back. Of course, there are many conversions between the characters, there are so many character sets, any of the two can be converted to try, can not be enumerated, reference to this article: http://www.imcjd.com/?p=1324, it is used for the conversion of the character of a number of conversion comparison and testing.

Among them, it can be learned that the exact match of the conversion is certainly not a problem, such as,gbk->gbk,utf8->utf8,latin1->latin1; conversion to single-byte encoding latin1 is no problem, such as Gbk->latin1, utf8->latin1; single-byte encoding (LATIN1) to other in some areas of the encoding may appear to be incomplete conversion, such as LATIN1->GBK (very special Chinese), or the length of the encoding change, such as Latin1->utf8, changed to 2, The number of bytes equal to 3.

The following paragraph is referenced in another article (HTTP://HI.BAIDU.COM/CUTTINGER/ITEM/F4E79726A60AB450C28D59DA).

"Latin1 is a very common character set, this character set is single byte encoded, backwards compatible with ASCII, its encoding range is 0x00-0xff,0x00-0x7f between full and ASCII, 0x80-0x9f is the control character, 0xa0-0xff is a text symbol. Obviously, Latin1 covers all of the single bytes, so you can save any string in the latin1 character set without worrying about the content not conforming to the LATIN1 encoding specification. --GBK and UTF8 are multibyte-encoded, without this feature.

MySQL users often use this full-coverage feature of Latin1 to keep other types of strings, GBK,UTF8,BIG5, etc., in the Latin1 column. There is no data loss during the save process, and as long as it is taken out, it is a valid Gbk/utf8/big string. If the GBK string is saved in the UTF8 column, the content in the GBK string that does not conform to the UTF8 encoded format is discarded, the contents of the save cannot be removed as is, and the data is actually compromised.

In conclusion, if we see that the character set of a field is latin1, then he may save any encoded string, and if the character set of a field is UTF8 or GBK, then he should save it either UTF8 or GBK-unless the user of the database is wrong. 】

I did not delve into the details of UTF8, GBK coding, most likely not accurate, only know the simple ASCII encoding (-_-), but can understand a global situation. From the above, Latin1 's single-byte encoding is useful, and other encodings can be converted to it and transferred back without losing content. The so-called single-byte encoding is next to each other, I understand is, such as Christmas arrived, you want to send sister a box of apples, for the manufacture of romantic, shop provides two kinds of packaging, one is by the number, that is, a single Apple packaging into a box, to a packaging one, so that, sister after all the boxes finished The whole can be restored to a complete and a box of intact apples, the second is by weight, each box weighs 22, 32, 62, so in the packaging, if just weight 32 of course can be completely put into a box, but if not enough or more, Mian not to cut apples, Or add another part of the Apple to the box, so that the girl no matter how to open the box, will get a box of incomplete apples, because you in accordance with this packaging method, has destroyed the integrity of the individual Apple, now restore not back ~ Our character set encoding conversion is doing this kind of repackaging work, latin1 just like a single apple wrapper, and UTF8 is like the second way.

And the exact match is that you go to buy a box of apples, all the apples in the box have been either 22 or 32 or 62, so when they are packed by weight, they are still the complete apple.

So plainly, two possible ways are:

1. All variables are set to Latin1 (set names latin1;), so that even if we use a variety of editing client coding (GBK or UTF8), we can finally get the correct result;

2. All set to GBK or gb2312 (GB code, only for Simplified Chinese), using an exact match;

3. For the intermediate conversion process, such as GBK input, character_set_client, character_set_connection as latin1,character_set_database as gb2312, Set the table when the character set is Gb2312,character_set_results can also be set as gb2312, of course, this is only chicken ribs, in essence, or LATIN1,GBK to Latin1 and then turn gb2312 only apply to the simplified.

Finally, about character set proofing rules, just a little bit more. When we set up MySQL character sets, MySQL will automatically give a corresponding proofing rules, such as setting CharSet to UTF8, the default collation is utf8_general_ci,gb2312 character set corresponds to Gb2312_chinese_ci, MySQL command to see all the proofing rules is show collation, to see a corresponding character set of the proof is show collation like ' utf8% '.

Character Set proofreading is a sort of the use of the current character set, contrast, even if the same character set, in different regions is also a different way of comparison, it is only to proofread such a saying, such as Utf8_general_ci, this CI is case insensitive, That is, the case is not sensitive, use it to proofread, query a field value matching, the case of records will appear, of course, there are other rules, utf8 print out a big lump, not fine research ~

Reference article:

Http://www.blogjava.net/wldandan/archive/2007/09/04/142669.html

Http://hi.baidu.com/cuttinger/item/f4e79726a60ab450c28d59da

http://www.imcjd.com/?p=1324

Http://www.360doc.com/content/11/0303/01/2588264_97631236.shtml

http://blog.csdn.net/my_mao/article/details/30498313

This article is from: Linux Learning Network

A little understanding of MySQL Chinese garbled

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More