Mysql garbled Problem Ultimate Guide

Mysql garbled Problem Ultimate Guide _mysql

Last Update:2017-01-19 Source: Internet

Author: User

Tags mysql client

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

MySQL set up a large number of character sets, from the client to connect to the result set, from the server to the table to the column, can set the character set, flexible is very powerful, but it is very easy to problem, if not understand its mechanism, it is easy to appear garbled problem.

In order to let everybody as far as possible in the work less or not by garbled trouble, here I combined with other students in the forum post, and combining their own understanding and practice, detailed analysis summed up, to treat you reader.

About the character set and the basic knowledge of garbled here is not detailed (please search yourself), but there is a problem need special emphasis: How garbled is generated?
This problem is believed that many students are ambiguous, or did not seriously think, anyway, understanding is "character encoding" does not lead to garbled, but did not really think about why "character encoding" will lead to garbled.
The answer is actually very simple: " conversion leads to garbled "!
Judging by this principle, the various situations are simple:

1 data transfer process will not result in garbled
2) data storage will not cause garbled
3 data input and output (including display) may result in garbled characters
4 data receiving and sending may result in garbled characters

More detailed explanation: the conversion causes the garbled character to mean that the data which is originally a set of characters is interpreted as the B character set, and not the correct a character set is converted to the B character set.
For example: The following MySQL character processing mechanism flowchart, the MySQL client sent is actually 2 gbk characters (4 bytes), but character_set_connection
Set the UTF8, so the MySQL server will receive 4 bytes of GBK data according to UTF8 parsing, get 1 Chinese characters + 1 bytes, then generated garbled;

If the character_set_connection is set to Gbk,mysql server after receiving the data in accordance with GBK resolution, get two correct Chinese, and then converted to the two Chinese corresponding UTF8 encoding, this will not produce garbled. ）

"MySQL's character processing mechanism"

The detailed processing mechanism is shown below:

Let's simulate a process from inserting data to reading to see how the character set is moved throughout the process.
"Insert Process"
1. The client sets its own code (character_set_client) to receive input from the user;
2. The client converts the user's input into a concatenated encoding (character_set_connection) =====> first Conversion
3. The client sends the converted data to the server; =====> transmission does not cause encoding conversion
4. The server receives the data from the client, then judges the character set of the data column, and converts the character conversion =====> the second time.
5. The server =====> storage of data, such as disk, does not result in coded conversions

"Read Process"
Omit the previous SQL statement processing process, starting with the data read
1. The server reads data from storage (such as disk) =====> storage does not cause encoding conversions, so reading from storage does not require
2. The server determines the character set (Character_set_results) in which the current connection returns results.
Converts the read data to the data required by the result set =====> the first conversion of the reverse, corresponding to the second coded conversion
3. The server sends the data to the client =====> transmission does not cause encoding conversion
4. The client receives the server data, according to the client's character set (character_set_client) carries on the code conversion =====> the reverse second conversion, corresponds to the forward first time code conversion
5. Client display data =====> you can see the garbled time

With this process, we can easily locate garbled possible place, as well as the generation of garbled character set configuration is which.
Ideally, in the whole process, all the places involved in character conversion do not need to be converted, so that no garbled characters are generated.

With the above theoretical analysis, we combine a garbled bag case, deepen understanding, some of the problems, please think about it, see if you really understand.

The test statement is to insert a Chinese character "you" whose UTF8 code is "0xe4 0xBD 0xa0",

1. Latin1 Send Package

Think about 1: Why did the client and the connection set the Latin1, but eventually sent the correct UTF8 encoding?

2. Latin1 Receive Package

Think about 2: Why did you receive the correct UTF8 code?

3. Latin1 does not display garbled

Think about 3: Why does latin1 display the correct UTF8 characters?

4. UTF8 Receive Package

Think about 4: Why the character set of the connection is the same as that of the database, and the data received is not UTF8? (Please compare with latin1 receive packet)

5. UTF8 Display Package

Think about 5: Why the character set of the connection and the database character sets are the same, showing instead of garbled?

How, the above study questions whether all have the answer, if not, believe the following picture can help you:

The character change diagram of this grab case:

attached:mysql character coding operation tips
"View Character Set Settings"

 mysql> show variables like '%char% '; +--------------------------+------------------ -----------------------------------+
| variable_name |
Description | +--------------------------+-----------------------------------------------------+
| character_set_client | Client Character Set | | character_set_connection | Current Connection Character Set | | Character_set_database | Database Character Set | | Character_set_filesystem | File system character set, do not modify, use binary can | | Character_set_results | Return result set Character Set | | Character_set_server |             Server default character set, when database, table, column is not set, | |   | Use this character set by default | | Character_set_system |
Fixed for UTF8 | +--------------------------+-----------------------------------------------------+

"Modify character Set Settings"
The server's configuration is set up by the DBA when the server is established and is not recommended for subsequent changes
Set the Character_set_client/character_set_connection/character_set_results character set at the same time through the Set NAMES UTF8 command
It is recommended that all configurations be set to UTF8

"Question Answer"

think about 1: Why did the client and the connection set the Latin1, but eventually sent the correct UTF8 encoding?
The client set the Latin1, and my statement is written from the notepad++, is UTF8 format;
The Chinese UTF8 is 3 bytes, and the latin1 is parsed by a single byte, although the conversion does not cause the binary content to change, but in fact the MySQL client thinks I have entered 3 latin1 characters;
If the client set the encoding is 2 bytes of GBK, then the conversion will be garbled, UTF8 3 bytes will be converted to 1 GBK characters (may be garbled, may not be garbled) plus a Western European character (less than 128 is English, greater than 128 is other Sioven)

Think about 2: Why did you receive the correct UTF8 code?
This is because the MySQL server converts the data from the "column" Encoding (UTF8) to the latin1, and the data stored in the column is not really UTF8 "you" corresponding "0xe4 0xbd 0xa0",
It's the "C3a4 c2bd c2a0" (6 bytes) that the MySQL server converts UTF8 c3a4 to latin1 0xe4,c2bd to 0XBD, c2a0 to 0xa0.

think about 3: Why does latin1 display the correct UTF8 characters?
Because the MySQL client received the MySQL server converted "0xe4 0xbd 0xa0", and this data as latin1 3 characters processing, and then thrown to the terminal (my is securecrt),
SecureCRT again these three latin1 as uft8 processing, the results of the Chinese "you" is displayed.

Think about 4: Why the character set of the connection is the same as that of the database, and the data received is not UTF8? (Please compare with latin1 receive packet)
Character sets are the same, the entire process does not require encoding conversion, the stored "C3A4 c2bd c2a0" directly back to the client

think about 5: Why the character set of the connection and the database character sets are the same, showing instead of garbled?
Reference thinking 4, the client received data directly to the terminal display, terminal is considered to be two UTF8 characters, and found the corresponding character and display, but we do not understand, so know is garbled, but these two characters show that there is no fault, if the real characters can not be found, may display a question mark or character set the default symbol.

The above is a large set of MySQL garbled problem, I hope to help you solve the problem of MySQL garbled, thank you for your reading.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More