Ultimate Guide to MySQL Garbled text

Source: Internet
Author: User

Ultimate Guide to MySQL Garbled text

Mysql has a large number of character sets, from the client to the result set, from the server to the database to the table to the column, you can set character sets, flexible and powerful, but it is very prone to problems, if you do not understand the mechanism, garbled characters may easily occur.

In order to make everyone feel less affected or free from garbled characters at work, I have analyzed and summarized the posts posted by other students on the Forum and their own understandings and practices in detail, you can see the official website.

The basic knowledge about character sets and garbled characters is not described in detail here (please search by yourself), but there is a problem that needs to be particularly emphasized:How is garbled code produced?
I believe many of you are confused about this problem, or you have never thought about it. The problem is that "character encoding" is incorrect, leading to garbled characters, but you have never really thought about why "character encoding" causes garbled characters.
The answer is actually very simple:"Conversion causes garbled characters"!
Based on this principle, the various situations are simple:

1) No garbled characters will occur during data transmission.
2) data storage will not cause garbled characters
3) data input and output (including display) may cause garbled characters
4) garbled characters may occur when receiving and sending data.

More detailed explanation: conversion leads to garbled characters, which means that data of Character Set A is parsed as Character Set B rather than being correctly converted to Character Set B.
For example, in the following mysql Character Processing flowchart, the mysql client actually sends two gbk characters (4 bytes), but character_set_connection
After utf8 is set, the mysql server parses the 4-byte gbk data received by utf8 to obtain 1 Chinese Character + 1 byte, which generates garbled characters;

If character_set_connection is set to gbk, the mysql server parses the data according to gbk to get two correct Chinese characters and then converts them to the utf8 encoding corresponding to these two Chinese characters, which will not produce garbled characters .)

[Character Processing Mechanism of mysql]

Detailed processing mechanisms include:



We can simulate a processing process from data insertion to Data Reading to see how character sets are moved and moved throughout the process.
[Insert Process]
1. The client sets its own encoding (character_set_client) to receive user input;
2. The client converts the user input to the connection encoding (character_set_connection) ===> the first conversion
3. The client sends the converted data to the server.
4. When the server receives data from the client, it judges the character set of the data column and performs character conversion ====> second conversion.
5. The server stores data (such as a disk) ====> the storage will not result in encoding conversion.

[Read process]
Skip the preceding SQL statement processing process and start from Data Reading.
1. The server reads data from the storage (such as a disk) ====> the storage does not result in encoding conversion, so reading from the storage does not require
2. character_set_results ),
Convert the read data to the data required by the result set ====> the first reverse conversion, corresponding to the second forward encoding conversion
3. the server sends data to the client ====> the transfer will not result in encoding conversion.
4. When the client receives data from the server, it performs the reverse Second Conversion Based on the character set (character_set_client) of the client, which corresponds to the first forward encoding conversion.
5. When the client displays data ====> you can see garbled characters

With this process, we can easily locate the possible causes of garbled characters and determine which character set is configured to generate garbled characters.
Ideally, there is no need to convert any part of the process that involves character conversion, so there will be no garbled characters.

With the above theoretical analysis, we will combine a garbled packet capture instance to further understand some of the issues. Please think about it and see if you really understand it.

Environment:
+ -------------------------- + ----------------------------------------------------- +
| Variable_name | Value |
+ -------------------------- + ----------------------------------------------------- +
| Character_set_client | latin1 |
| Character_set_connection | latin1 |
| Character_set_database | utf8 |
| Character_set_filesystem | binary |
| Character_set_results | latin1 |
| Character_set_server | utf8 |

The test statement inserts a Chinese character "you", whose utf8 code is "0xE4 0xBD 0xA0 ",

1. latin1 sending package


Think 1: Why is latin1 set in both the client and connection, but the final message is correctly UTF-8 encoded?

2. latin1 receiving package


Think 2: Why is the received utf8 code correct?

3. latin1 does not display garbled characters


Think 3: Why does latin1 display the correct utf8 character?

4. utf8 receiving package


Think about it 4: Why is the connected character set the same as the character set in the database, and the received data is not utf8? (Compare with latin1 received data packets)

5. utf8 display package


Think 5: Why is the character set to be connected the same as the character set of the database, and the display is garbled?

How about whether all the questions above have answers? If not, I believe the figure below will help you:

The character variation diagram of this packet capture case:


Appendix:Mysql character encoding skills
[View Character Set settings]

Mysql> show variables like '% char % '; + character + | Variable_name | description | + character + | character_set_client | client character set | character_set_connection | current character set | character_set_database | database character set | character_set_filesystem | File System Character Set, do not modify it. Use binary. | character_set_results | returned result set character set | character_set_server | default server character set. When no database, table, or column is set, | this character set is used by default | character_set_system | utf8 | + -------------------------- + ----------------------------------------------------- +

[Modify Character Set settings]
The configuration of the server is set by the DBA when the server is set up. We do not recommend that you change the configuration later.
Use the set names utf8 command to SET the character SET of character_set_client/character_set_connection/character_set_results at the same time.
We recommend that you set all configurations to utf8.

[Answer]

Think about 1:Why is latin1 set in both the client and connection, but the final message sent is utf8 encoded correctly?
The client sets latin1, and my statements are written from notepad ++ in utf8 format;
The Chinese utf8 is 3 bytes, while latin1 is parsed by a single byte. Although it is converted, it does not cause binary content changes, however, the mysql client considers that I have entered three latin1 characters;
If the encoding set by the client is a two-byte gbk, the conversion will be garbled, and the three UTF-8 bytes will be converted to one gbk character (it may be garbled, or it may not be garbled.) Add a western European character (less than 128 is English, more than 128 is other West Irwin)

Think about 2:Why is the received utf8 encoded correctly?
This is because the mysql server converts the data from the "column" encoding (utf8) to latin1, the data stored in columns is not the "0xe4 0xbd 0xa0" corresponding to "you" in utf8 ",
Instead, the "c3a4 c2bd c2a0" (6 bytes) is captured later. The mysql Server converts the c3a4 of utf8 to 0xe4 of latin1, c2bd to 0xbd, and c2a0 to 0xa0.

Think 3:Why does latin1 display correct utf8 characters?
Because the mysql client received the converted "0xe4 0xbd 0xa0" from the mysql server and processed the data as three characters of latin1, then it threw the data to the terminal (my name is SecureCRT ),
SecureCRT treats the three latin1 as uft8, and the result "you" is displayed in Chinese.

Think about 4:Why is the connected character set the same as the character set of the database, and the received data is not utf8? (Compare with latin1 received data packets)
If the character set is the same, the entire process does not need to undergo encoding conversion. The stored "c3a4 c2bd c2a0" is directly returned to the client.

Think about 5:Why is the character set connected to the database garbled when it is set to the same as the character set of the database?
Reference 4: after the client receives the data, it is directly thrown to the terminal for display. The terminal considers it to be two UTF-8 characters and finds and displays the corresponding characters, but we do not understand it, so we know it is garbled, however, there is no error in displaying these two characters. If the characters cannot be found, the question mark or the default symbol specified by the character set may be displayed.

The above is a big collection of MySQL garbled issues, hoping to help you solve the MySQL garbled problem. Thank you for reading.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.