MySQL character encoding system (ii)--Data transmission code

Source: Internet
Author: User

The MySQL character encoding system can be divided into two parts: one is about how the database server itself manages the encoding of character data when it stores data tables, and the other is about how the client and the database server transmit data. Previous MySQL character encoding system (a)--data storage coding discussed data storage coding, this chapter discusses the transmission coding.

The MySQL client can be divided into two types: one is the official Client--mysql command program written in C, and the other is a client written by a common programmer using the connector API such as JDBC. Only the first one is discussed here.

Windows Client

The MySQL command program in Windows and Linux systems about character encoding processing part is not equivalent, is the Windows system's client character encoding conversion logic:


Three of these character variables exist on the server and Charset_info exist on the client.
When the client initiates a connection to the server, the client sets Charset_info to the specified encoding according to the configuration parameters and notifies the server to have the server set the three character variables to the same encoding.

Data transfer process
    1. The client reads a line of command text from the console standard input, which is encoded as the operating system code;
    2. The client encodes the command from the system code transcoding to the client charset_info variable;
    3. The client sends the command text to the server;
    4. The server decodes the received text into Character_set_client encoding, which is usually consistent with the client charset_info;
    5. The server to transcode the command text to Character_set_connection;
    6. The server executes the command, producing the result;
    7. The result transcoding to Character_set_results sent to the client;
    8. The client decodes the received result into charset_info encoding, which is usually consistent with character_set_results;
    9. The client transcode the results to the operating system code and outputs to the console standard output.

Because the MySQL program on the Windows platform uses the Unicode console read API when reading the console, the original string that the program gets from the console is actually UTF16 encoded, so the "operating system code" here is not the usual GBK for Windows, And should be seen as UTF16.

Linux Client

Is the MySQL client program character encoding conversion logic in a Linux system:


It differs from the Windows version in that it does not cast the operating system encoding string from the terminal standard input to charset_info encoding, nor does it cast the Charset_info encoded result string output to the terminal into the operating system encoding. In other words, the MySQL program of the Linux platform ignores the charset_info variable at this time. Of course, the data transfer process for the Linux client is a few steps less than the Windows client.

Garbled Trap Simulation

According to this feature of the Linux platform MySQL program, it is easy to create a possible trap: in the Linux system through the MySQL client to the database after inserting Chinese data, the query results are not garbled, But from the configuration of the correct Windows platform MySQL client query the same table gets garbled.
It is possible to simulate the above situation:
Create a table that contains only one GBK string field and UTF8 string field. Linux starts MySQL connection to the database server and modifies the server's three character variables from the default UTF8 to GBK. Insert Chinese data into database, select immediately, result no exception:


However, when using the Windows MySQL client query, the result is garbled:


Garbled analysis

In combination with the previous data transfer process, you can know where the problem lies:

    1. The client reads a line of UTF8 encoding (linux default) command text from the terminal, ignores the charset_info variable, and sends the resulting to the server directly;
    2. Server because of the prior command CharSet GBK three character variables are set to GBK, so the server thought the text received is GBK encoding;
    3. Next the server will save the text string directly into the data table without any transcoding, because the first field of the data table is also GBK.
Up to this point, a UTF8 string is stored in the data table, while the server is GBK when queried by the same Linux client:
    1. The string in the table is sent directly to the client without any transcoding, because Character_set_results is also GBK;
    2. The client receives the query result because ignores the charset_info and directly does not pass the transcoding output to the terminal standard output;
    3. The data obtained by the terminal is actually UTF8 encoded, so the output is normal.
When you query a Windows client:
    1. The string in the table (UTF8) is sent directly to the client without any transcoding, because Character_set_results is also GBK;
    2. The client receives the query result and considers it to be charset_info encoded (GBK at this time);
    3. The client converts the query result from Charset_info to UTF16 and then calls the Unicode Console Write API output to see garbled characters.
Garbled "fix"

If the Windows client also wants to see the correct results, it should be intentionally misconfigured:

    1. Executes the command charset UTF8, which sets Charset_info and three server character to UTF8;
    2. Execute the command set names GBK, which will only set the three server character to GBK;
    3. Now select, the result looks no longer garbled.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.