MySQL character encoding system (II) -- Data Transmission Encoding
The character encoding system of MySQL can be divided into two parts: one is about how to manage the encoding of character data when the database server stores data tables, and the other is about how to encode the data transmitted between the client and the database server. MySQL character encoding system (1) -- Data Storage encoding discusses data storage encoding. This article discusses Data Transmission Encoding.
MySQL clients can be divided into two types: one is the official client written in C language-MySQL command program; the other is the client written by common programmers using JDBC and other ctor APIs. The first type is discussed here.
Windows Client
The character encoding Processing Section of the MySQL command program in Windows and Linux is not equivalent. It is the client character encoding conversion logic in Windows:
Three character variables exist on the server, while charset_info exist on the client.
When the client starts to connect to the server, the client sets charset_info as the specified Encoding Based on the configuration parameters, and notifies the server to set the three character variables to the same encoding.
The data transmission process client reads a line of command text from the standard input in the console, Which is encoded as the operating system code; the client transcodes the command from the system to the encoding set by the charset_info variable of the client; the client sends the command text to the server. The server decodes the received text to character_set_client encoding, which is usually consistent with charset_info. The server transcodes the command text to character_set_connection. The server executes the command, generate results. transcode the results to character_set_results and send them to the client. The client decodes the received results to charset_info encoding, which is usually consistent with character_set_results. The client transcodes the results to the operating system code, output to standard output in the console.
On Windows, the MySQL program uses the Unicode Console Read API when reading the Console. Therefore, the original string obtained by the program from the Console is actually UTF16 encoded, therefore, the "Operating System Code" here is not a Windows GBK, but should be regarded as UTF16.
Linux Client
Is the character encoding conversion logic of the MySQL client in Linux:
VcyoTXlTUUy/release/ydLU1eLR + release/2o7o8YnI + release/b7dv + K3/release/LK/b7dv + release + pgltzybzcm9 "http://www.2cto.com/uploadfile/Collfiles/20140714/2014071409163740.png" alt = "\ ">
However, when you use the Windows MySQL client for query, the results are garbled:
Garbled Analysis
In combination with the previous data transmission process, you can know what the problem is:
The client reads a line of utf8 encoded (Linux default) command text from the terminal, ignores the charset_info variable, and directly sends the text to the server; the server sets the three character variables to gbk by running the charset GBK command in advance. Therefore, the server considers the received text to be GBK encoded; then, the server directly saves text strings to the data table without any transcoding, because the first field of the data table is GBK. So far, the data table has a UTF8 string, while the server queries the string when it is GBK on the same Linux client:
The strings in the table are directly sent to the client without any transcoding, because character_set_results is also GBK. After receiving the query result, the client directly outputs the string to the terminal standard output without transcoding because charset_info is ignored; the data obtained by the terminal is actually UTF-8 encoded, so the data is output normally. When querying on a Windows client:
The character string (UTF8) in the table is directly sent to the client without any transcoding, because character_set_results is also GBK. After receiving the query result, the client considers it charset_info encoding (GBK at this time ); the client transcodes the query result from charset_info to UTF16, and then calls the Unicode Console Write API output to see garbled characters. Garbled "Repair"
If the Windows client also wants to see the correct results, it should be intentionally incorrectly configured:
Run charset utf8, which sets charset_info and character of the three servers to UTF8. Run the set names gbk command, which sets character of the three servers to GBK. Now select, the results do not look garbled.