Windows Server MySQL Chinese garbled solution, mysql garbled

Source: Internet
Author: User

Windows Server MySQL Chinese garbled solution, mysql garbled

When we drum up mysql ourselves, we will inevitably encounter this problem: Garbled characters are inserted into Chinese characters. Although this is an environment for O & M first, when we play on our own machine, you have to know something about B. Otherwise, how can you boast B elegantly.

If you have encountered this problem, let's not talk about the cause. In the cmd of the PC (or the Command Line client installed in mysql, or the SecureCRT used for work), try it out. Go to the mysql environment and start from scratch. Assume that your client encoding is gbk or utf8 (this is not rigorous, how can we assume it? But generally, if it is not updated after installation, cmd is gbk encoding, after mysql is installed, the Command Line client is not installed. Remember to use CRT to check the encoding settings in Session Options. It is usually set to utf8. Execute some statements:

1. Set the character set of the encoding client, connection, and returned results to latin1 first.

2. Run the following command to check whether the characters are like this.

If your character_set_client, character_set_connection, and character_set_results are not latin1, you can set them to latin1, for example, character_set_client. The other two are the same, make sure that all three are latin1 (the first SQL statement actually does this ),

 

3. Create a database db_latin1 separately. Of course it is very simple. For testing, set the database encoding to latin1 during creation.

 

4. Create a table tab_latin1 under it, and set the character set to latin1. Here, no character is set. The database level has been set. Here, only one name field is created.

5. Insert some Chinese characters to the table. First, the cmd code of the local machine is gbk. Right-click the attribute and choose> Option to view the current code page.

6. view the result

Check it out. Chinese characters are displayed normally ~~~

Okay, you don't want to know why I can't set it like that. :

We know that mysql is a client-server software. Each operation is performed by the client to send a request to the server, and some results may be returned. The inserted characters are converted in series. The client for editing has an encoding. For example, the command line on the PC side is gbk by default, and the new text file on the notepad on the PC side is ANSI by default. common text editors such as notepad ++, we may set the default encoding to utf8, that is, editing in the editor. What you see is actually an encoding.

1. After editing on the client, it is first converted to the character set corresponding to the client, that is, the character set indicated by the character_set_client variable printed above;

2. send a request to the database service. During the sending process, the connection character set is converted to the character set corresponding to the character_set_connection variable;

3. the character set stored in the database may be character_set_server, character_set_database, or table-level or column-level );

4. When the database receives the request, it executes the query to obtain the result and converts it to the results character set again, which is indicated by the character_set_results variable. The result is returned to the client;

5. The result is encoded according to the results character set. It is also important for the client tool that shows the result to support the encoding, which determines how the client tool decodes the result. If the result is utf8 encoded, it is returned to a client, but this client only has ANSI encoding, it certainly cannot be displayed normally. For example, if it returns to SecureCRT, the result is not displayed normally, however, CRT supports multiple types of encoding. If we manually convert it to utf8 encoding, it will be displayed as normal again. Therefore, this step is not strictly related to the client conditions, after all, this step does not exist after we know that the client is changed to a normal encoding or the results encoding is supported.

In the preceding step 3rd, the encoding used for converting the connection character set encoding to database storage is divided into several situations. Generally, when we install mysql, especially the 32-bit installation version, there is a select encoding step in the middle, most of them will select utf8 encoding, then the system may set a series of character set variables to utf8, such as character_set_server, character_set_connection, character_set_database, and so on. That is to say, the character_set_server variable is set before you start the mysql service. We can call it server-level encoding. Before creating a table, we must first create a database. When creating a database, we know that the encoding can be explicitly specified. For example, when I create the initial part, the latin1 character set can be explicitly specified or not specified. If not specified, it uses a server-level character set, namely character_set_server. Similarly, when creating a table, no encoding is specified. If no encoding is specified, database-level encoding and character_set_database are used, similarly, you can specify the encoding when creating column fields in the table. If no encoding is specified, the table-level character set is used. Therefore, this inheritance relationship exists here:

Character_set_server => character_set_database => character set in table (no such variable) => character set column (no such variable)

Creating a mysql table can be refined to these four layers, not all of which must be specified. By default, the character set at the upper level is used (the character verification rule is also like this, collation, which will be described later ).

Is it possible that character_set_server is not specified? If it is not specified anywhere, especially in non-installed versions, if you forget, mysql uses latin1 by default during compilation. To cope with this situation, in non-installed versions, you often need to manually configure the mysql configuration file When configuring mysql. ini:

The default character set used in the configuration file. Therefore, if character_set_server is specified, it will be used by default. Otherwise, it will be inherited in turn.

For others, character_set_filesystem: converts the characters in the operating system to this character set, that is, character_set_client is converted to character_set_filesystem. If the default value is binary, character_set_system: this variable is always utf8, character_set_dir is a variable that stores system metacharacters, such as the table name, column name, and user name, it stores various mysql xml format files used for encoding character sets. The above three values can be ignored when the garbled problem is solved.

Well, the conversion process and the meaning of each variable are clear. It is necessary to find out which character sets can be converted, and the conversion may also be able to convert characters within a certain encoding range, it will not be garbled or even damaged. If it is damaged, it will no longer be correctly displayed. Even if the settings are correct, the restoration will not be restored. Of course there are a lot of conversion between characters, there are so many character sets, any two can be converted between try, can not list one by one, you can refer to this article: http://www.imcjd.com /? P = 1324, which is used to compare and test the conversion of frequently used characters.

It can be seen that the exact match conversion is certainly no problem, for example, gbk-> gbk, utf8-> utf8, latin1-> latin1; it's okay to convert to single-byte latin1, such as gbk-> latin1, utf8-> latin1, and single-byte encoding (latin1) otherwise, the conversion may be incomplete in some encoding ranges, such as latin1-> gbk (a special Chinese character), or the encoding length, such as latin1-> utf8, the number of bytes, such as 2 and 3.

The following references a section in another article (http://hi.baidu.com/cuttinger/item/f4e79726a60ab450c28d59da.

[Latin1 is a common character set, which is single-byte encoded and backward compatible with ASCII. Its Encoding range is 0x00-0xFF, 0x00-0x7F, and 0x80-0x9F, 0xA0-0xFF is a text symbol. Obviously, Latin1 overwrites all single bytes. Therefore, any string can be stored in the latin1 character set without worrying about being discarded because the content does not conform to latin1 encoding specifications. -- Gbk and utf8 are multi-byte encoding.

Mysql users often use this full coverage feature of Latin1 to store other types of strings, such as gbk, utf8, and big5 In the latin1 column. There is no data loss during the saving process. As long as the data is retrieved as is, it is a legal gbk/utf8/big string. If you save the gbk string in the utf8 column, the content in the gbk string that does not conform to the utf8 encoding format will be discarded. The saved content cannot be retrieved as it is, and the data is actually damaged.

To sum up, if we see that the character set of a field is latin1, it may save any encoded string, and the character set of a field is utf8 or gbk, it should be utf8 or gbk-unless the database user uses an error.]

I have not thoroughly studied the details of utf8 and gbk encoding. I am very likely to be inaccurate. I only know simple ASCII encoding (-_-), but I can understand the global situation. From the above, latin1's single-byte encoding method is very useful. Other encodings can be converted to it and then transferred back without losing content. The so-called single-byte encoding is one by one. I understand that, for example, when Christmas arrives, you need to send a box of apples to your sister to provide two packaging methods for making romance and shops, one is to package one apple into one box by number. In this way, after all the boxes are split, the sister can be restored to a complete box and a box of intact apples. The other is by weight, each box is limited to 2, 2, 3, 2, and 6, so that when packaging, if the weight is 3, you can of course put it in a complete box, but if it is not enough or more, you cannot cut the apple, or add other apples to the box. In this way, the girl will get a box of broken apples no matter how she opens the box, because when you follow this packaging method, the integrity of a single Apple has been damaged and cannot be restored ~ Our character set encoding conversion is doing this type of re-packaging. latin1 is like a single apple packaging, while utf8 is like the second method.

The exact match we just mentioned is that when you buy an apple box, the weight of all the apples in the box is either 2 or 3 or 6, in this way, the package by weight is of course allocated, and the result is still the complete apple.

To put it bluntly, there are two feasible methods:

1. All variables are set to latin1 (set names latin1;). In this way, even if the encoding of the editing client is diverse (gbk or utf8), the correct result is obtained;

2. All settings are set to gbk or gb2312 (the national standard encoding, which is only used for simplified Chinese). Full match is used;

3. for the intermediate conversion process, such as gbk input, character_set_client and character_set_connection are regarded as latin1, character_set_database is set to gb2312, character_set_results can also be set to gb2312, of course, this is only a chicken rib. In essence, latin1 is used. If gbk is used to convert latin1 to gb2312, it is only applicable to simplified Chinese characters.

Finally, you only need to know about character set verification rules. When we set the mysql Character Set, mysql automatically provides a corresponding verification rule. For example, if charset is set to utf8, the default collation is utf8_general_ci, And the gb2312 character set corresponds to gb2312_chinese_ci, run the mysql command to check that all the verification rules are show collation. Check that the verification rules of a character set are show collation like 'utf8%.

Character Set verification is a sort and comparison method for using the current character set. Even if the same character set is used, it is different in different regions, for example, utf8_general_ci. This ci is case insensitive, that is, it is case insensitive. When it is used for verification, when a field value is matched, the case-sensitive record will appear. Of course there are other rules, utf8 printed out a large batch of data. I have studied it carefully ~

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.