Character encoding system of MySQL (II)-Data Transmission Encoding, mysql Data Transmission
The character encoding system of MySQL can be divided into two parts: one is about how to manage the encoding of character data when the database server stores data tables, and the other is about how to encode the data transmitted between the client and the database server. MySQL character encoding system (1) -- Data Storage encoding discusses data storage encoding. This article discusses Data Transmission Encoding.
MySQL clients can be divided into two types: one is the official client written in C language-MySQL command program; the other is the client written by common programmers using JDBC and other ctor APIs. The first type is discussed here.
Windows Client
The character encoding Processing Section of the MySQL command program in Windows and Linux is not equivalent. It is the client character encoding conversion logic in Windows:
Three character variables exist on the server, while charset_info exist on the client.
When the client starts to connect to the server, the client sets charset_info as the specified Encoding Based on the configuration parameters, and notifies the server to set the three character variables to the same encoding.
Data transmission process
On Windows, the MySQL program uses the Unicode Console Read API when reading the Console. Therefore, the original string obtained by the program from the Console is actually UTF16 encoded, therefore, the "Operating System Code" here is not a Windows GBK, but should be regarded as UTF16.
Linux Client
Is the character encoding conversion logic of the MySQL client in Linux:
It differs from Windows in that it does not forcibly convert the operating system encoding string from the terminal standard input to charset_info encoding, it will not forcibly convert the charset_info encoding result string output to the terminal to the operating system encoding. That is to say, the charset_info variable will be ignored by the MySQL program on Linux. Of course, the data transmission process of the Linux client is several steps less than that of the Windows client.
Garbled trap Simulation
According to the characteristics of the MySQL program on the Linux platform, it is easy to generate a possible trap: After inserting Chinese data into the database through the MySQL client in the Linux system, the query results are not garbled, however, garbled characters are obtained from querying the same table on the MySQL client of the Windows platform with the correct configuration.
We can simulate the above situation as follows:
Create a table that contains only one GBK string field and UTF8 string field. In Linux, start MySQL to connect to the database server, and change the three character variables of the server from the default UTF8 to GBK. Insert Chinese data to the database and select immediately. The result is normal:
However, when you use the Windows MySQL client for query, the results are garbled:
Garbled Analysis
In combination with the previous data transmission process, you can know what the problem is:
So far, the data table has a UTF8 string, while the server queries the string when it is GBK on the same Linux client:
When querying on a Windows client:
Garbled "Repair"
If the Windows client also wants to see the correct results, it should be intentionally incorrectly configured:
In the mysql database table, Chinese characters are garbled. Which encoding should I choose?
There are many types of character sets in the database. I personally suggest using utf8 (UTF-8) as much as possible for the database character set so that your data can be migrated smoothly, because the utf8 character set is currently the most suitable character set for conversion between different character sets, although you cannot correctly view the content in the database on the command line tool, I strongly recommend that you use utf8 as the default character set. if you want to use gb2312 encoding, we recommend that you use latin1 as the default Character Set of the data table so that you can insert data directly in the command line tool in Chinese and display it directly. instead of using the gb2312 or gbk character sets, if you are worried about query sorting and other issues, you can use the binary Attribute constraints to mainly affect programming, client character sets and database character sets (there is also a server character set, do not know what to do), the common operations in the database are to save data and read data. In this process, chaos and garbled characters seem to have nothing to do with the database character set. We only need to ensure that the character set selected during the write operation is consistent with the character set selected during the read operation, that is, we only need to ensure that the client character set for the two operations is consistent.
The process of executing a query on a MySQL client is generally: enter an SQL statement at the client prompt, press enter, and the terminal displays the query result. In this process, only when the terminal and three MySQL System variables specify the correct character set can we ensure that we send a correct SQL statement to the server, and then the server returns the correct result, and displayed correctly on the terminal.
The three MySQL System variables are:
1. character_set_client, the terminal Character Set, tells the Server the encoding format of the SQL statement submitted by the client.
2. character_set_connection, connected character set, is the encoding format used by the server to translate SQL statements
3. character_set_results: Character Set of the returned result set, which is the encoding format used to convert the result set before the server returns the result set.
Run the show variables like 'Char % 'command on the MySQL terminal to view the values of these variables. These three variables are usually set to the same character set. You can use the set names [charset name] command to modify the values of these three variables. In general, as long as you set a character set that can represent your data, your query results can be correctly displayed on the terminal.
For example, the table t1 is UTF-8 encoded. The field c1 in the table inherits this encoding. The table is created as follows:
Mysql> create table t1 (c1 text not null) character set utf8;
The character used is the Chinese character "fan", gbk encoding is B7 B6, utf8 encoding is E8 8C 83
Insert data using the following SQL statement
Mysql> insert into t1 values ('ffan ');
A) if the terminal is set to utf8 and set names utf8 is executed, the UTF-8 code inserted to the database is the "fan" character. In this process, MySQL does not need to perform encoding conversion. The data written to the database can be verified by executing select hex (c1) from t1 to obtain the hexadecimal encoding of the data.
B) if the terminal is set to utf8 and set names gbk is executed, the binary data written after the insert operation is E9 91 BC, this is the UTF-8 encoding of the Chinese character gong. This is because the "fan" entered by the terminal uses UTF-8 encoding, while the server thinks that the content sent by the terminal is gbk encoding, therefore, a gbk to utf8 conversion is performed during the insertion into Table t1, and the result is of course incorrect.
C) if the terminal is set to gbk and set names gbk is executed, the UTF-8 encoding of the word "fan" is still written to t1 after the insert operation is completed. During the insertion process, the terminal entered the gbk code of "fan" B7 B6, and the server was notified to the terminal to issue the message... the remaining full text>
How to view the encoding Character Set of mysql custom database
If you use Navicat Premium software for management, right-click the Database Name and right-click to view the character set!
If you use PHPADMIN, click the database name and the character set is displayed on the right;