Character Set related issues in mysql data verification

Source: Internet
Author: User

Scenario:
Master database DB: utf8 Character Set
Slave database DB: gbk character set

Requirements:
Verify that the master and slave data are consistent and repair

Verification process:
Set the master database connection to utf8, set the slave database connection to gbk for separate queries, and compare the returned result sets by record by field.

Display result:
If the characters are the same, the data verification results are different.

Cause analysis:
For the master database, because the character set for the connection is UTF8, the returned Chinese characters are encoded in UTF8 format; for the slave database, they are in GBK format, in the program, the strcasecmp function is used to compare strings. Obviously, different character sets are encoded. The same character set has different binary values, so the results are certainly not equal.

Further analysis:
In this case, which character set should be used to establish a connection? GBK or UTF8. In fact, it is OK to select any character set. As long as the character set for accessing the master database is consistent with that for the slave database, the only difference is that if the character set selected is different from the character set of the client, the characters may not be properly displayed, that is, the characters are garbled.

Let's take the character set of the client as an example to describe three situations in detail: [the client here can be considered as SecureCRT]
Note: The Green Box indicates the DB character set, the yellow box indicates the connection character set, and the orange box indicates the client.
First case:

 

In this case, the master database returns the GBK encoding of the characters, and the slave database returns the UTF8 encoding of the characters. Therefore, errors may occur when comparing fields.

Case 2:

 

The connection to the master database remains unchanged, and the slave database connection is changed from UTF8 to GBK. Therefore, when the database returns the data, it converts the DB character set to GBK and returns it to the client, the same characters are all expressed in GBK encoding. Therefore, the binary values are equal and the verification result is correct.

Case 3:

 

The connection to the master database and the slave database is UTF8. Therefore, for the master database, the character encoding returned to the client is changed from GBK to UTF8. At this time, both the master database and the slave database are UTF-8 encoded, and the verification result is correct. However, because the client is essentially in GBK encoding mode, the returned Chinese characters are garbled, but the correctness of the verification results is not affected.

Fix:

Since the same character set as the Master/Slave database for access does not affect the correctness of the verification results, how can this problem be solved? Because the UTF8 encoding range is greater than the GBK encoding range, if you use GBK to connect to the UTF8 encoding DB, some characters may not be expressed by GBK.

The second case shows that the master database is GBK, the slave database is UTF8, And the GBK is used to access UTF8. Assume that some characters are lost when UTF8 is converted to GBK. At this time, the master database and slave database are inconsistent because some characters cannot be expressed by GBK. Assume that the repair statement is as follows:

Update t set c1 = master_value where c1 = slave_value and id =?

T indicates the table name, id indicates a row of the primary key, master_value indicates the value of the c1 column of the primary database, and slave_value indicates the value of the c1 column of the standby database. At this time, slave_value is lost because UTF8 is converted to GBK. Therefore, statement execution affects zero rows of records and cannot be repaired.

 

Conclusion:

When the client accesses two character set libraries for data verification, the connection uses a character set with a larger range. For example, the commonly used Character Set representation range is as follows:

Latin <gb2312 <gbk <utf8

 

Appendix: character set encoding conversion process when the mysql client communicates with the server

Related parameters:

-Character_set_client: character set used by the client source data

-Character_set_connection: Connection layer Character Set

-Character_set_results: Query Result Character Set

-Character_set_database: Default Character Set of the currently selected Database

-Character_set_system: System metadata (field name, etc.) Character Set

1. client request Server

1) convert the client character set to the connection Character Set

2) convert the connection character set to the character set inside the DB

 

2. The server returns the result to the client.

1) convert the DB internal character set to the connection Character Set

2) convert the connection character set to the character_set_results character set.

 

3. set the character set command: set names character encoding

Specifies the character set for communications between the client and the server, including requests and responses.

Set names 'X' is equivalent:

SET character_set_client = x;

SET character_set_results = x;

SET character_set_connection = x;

Figure:

 

Source Network

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.