MySQL multi-byte character set causes master-slave data inconsistency problem

Source: Internet
Author: User
Tags prepare

Problem arises
There has been a legacy problem on the line, and recently the DBA has raised it, so followed by the following code, the following simple analysis, the problem is described as follows:
In a master-slave environment, the insertion of data in a table on master can result in inconsistent master-slave data, and by trial and error, the conditions for this situation are determined as follows:

    • CHARACTER_SET_SERVER=GBK is set on master
    • The prepared statement is used in the application and the Usecursorfetch=true is set
    • The Int field in a table on master is inserted as a character

For example, there is a table on master with the following structure:

CREATE TABLE T(ID int auto_increment primay key, count int)engine=InnoDB ;

The following actions are performed:

Conn= drivermanager. ();
pstmt = Conn. ( "INSERT into Test (?)" pstmt. Setstring (1, "1" pstmt. Execute

MySQL> Select * from T;
+----+-------+
| ID | count |
+----+-------+
| 1 | 1 |
+----+-------+

MySQL> Select * from T;
+----+-------+
| ID | count |
+----+-------+
| 1 | |
+----+-------+

As can be seen from the above information, the data on the master-slave inconsistent, look at the Binlog on the master, we will find the following information:

Insert into t(count) values(0x31)

The value ' 1 ' in the Binlog is converted to 16 binary 0x31.

Problem analysis
Binlog in some cases will be converted to 16 binary storage, this may be a lot of people did not notice, converted into 16 into the system has caused the Master-slave data inconsistency, this makes people more difficult to accept, the following our analysis from these two aspects:
    • Why does MySQL convert binlog into 16, and in those conditions it turns into 16-binary?
    • Why does the data appear inconsistent after converting to 16?


First, let's analyze the first question and follow the test on the above conditions, and we quickly find out that MySQL has two conditions for converting strings in Binlog into 16 binary storage:

    • The client uses prepared statement
    • The code passed by the client is a multibyte-character set encoding.


When using prepared statement, the client needs to parse out the internal given parameters before executing, and because of the problem of escaping, it is necessary to replace the characters within the parameters such as \0,\n and so on. While for multibyte character set encoding strings (such as: gbk,gb2312), the escape character ' \ ' may appear in the second byte of the character, such as the string "?", possibly the actual encoding is two characters '? \ ' and ' 0 ', if replaced by the principle of substitution, There is a good chance that the contents of the original string will be destroyed, so MySQL for this escape character ' \ ' may appear in the second byte of the character set, gave an identity to explain (Escape_with_backslash_is_dangerous=1), when processing, When this character set is encountered, the string is directly converted into a 16-binary process, which avoids escaping problems.
The specific code can see the Append_query_string function in the log_event.cc file, the key code is as follows:

If (Csinfo-Escape_with_backslash_is_dangerous)
Ptr=Str_to_hex(Ptr, From-Ptr(), From-Length());
Else
{
*ptr++= Ptr+= Escape_string_for_mysql (csinfo, Ptr,0,from->ptr (), from->length *ptr++= ' \ ' } /span>

  
my_charset_big5_chinese_ci
My_charset_big5_bin
My_charset_cp932_japanese_ci
My_charset_cp932_bin
My_charset_gbk_chinese_ci
My_charset_gbk_bin
my_charset_sjis_japanese_ci
My_charset_sjis_bin

After analyzing the first question, and then looking at the second question, why is the data on the master and slave inconsistent when the binlog is turned into 16 binary?
MySQL, the determination of the parameter type in the parsing of the SQL statement, because prepare statement first given the SQL after setting the value of the form, so in parsing SQL, MySQL still do not know the specific values, constructs the Item_param object, and then through the settings, The following parameter ' 1 ' is set to the Str_value member of Item_param, and there is no prepared statement information for Salve,binlog transmission, so on the slave, it is performed in a general way, Slave found that value is 0x31, so we construct a Item_hex_string object to save, so we just need to look at the next two item Save_in_field method, we can find out the specific reason, first look at the processing on master:
Field->store (Str_value.ptr (), Str_value.length (), Str_value.charset ());
The final treatment is as follows
Get_int (CS, from, Len, &rnd, uint_max32, Int_min32, int_max32);
Our field type is int, and the actual given value is string, which needs to be converted at the time of processing, and the key code for the conversion is as follows:

For (Ul= 0 ;Str<End9&& (ch= (uchar) (*str - ' 0 ' )) < 10; Str++)
Span class= "PLN" > {
Ul= ul * 10 + } /span>

From the above code we can know that the master conversion method is similar to the Atoi function in C, for example, the string ' 1234 ' will be converted to 1234, if the middle of non-numeric characters, the latter part will be truncated, so, the data on Master is the string ' 1 ' converted value 1
Now look at the treatment on the slave:
The method of processing conversions on slave is mainly in the following code:
Nr= (ULONGLONG) val_int ();
The process of the function val_int is as follows:
The field type is checked first, and if it is string, it is treated as a string, and if it is not a string, it is handled as follows

Char *End=(Char*)Str_value.Ptr()+Str_value.Length(),
*Ptr=End-Min(Str_value.Length(),sizeof(Longlong));
Ulonglong value=0 for (; ptr != end ;++)
Value= (value << 8) + (ulonglong) ( Uchar *ptr

Slave by forcing each byte in value to be converted, so if value is ' 1234 ', the value passed by coercion will be:

(Uchar)' 1 ' << 24 + (Uchar)' 2 ' << 16 + (Uchar)' 3 ' << 8 + ( uchar) ' 4 '
= 31 << 24 + 32 << 16+ 33 << 8 +

This will result in inconsistent data on the master-slave.

Summarize
The main reason for this problem is that MySQL's two different Item handling of the string-to-integer method is inconsistent, item_param through a similar atoi-like form, the number of characters directly through-' 0 ' to the integer type, and Item_hex_ string is the result of forced memory conversion, both of which are reasonable, but the two sides are not unified, resulting in replication error, MySQL from 5.1 to the current MySQL5.5 version of the problem (MySQL5.6 has not been tested, there should be the problem). Workaround:
    1. Service side use UTF8 character set encoding (due to the previous GBK, change to UTF8 will appear garbled and so many problems)
    2. Change application does not insert int field for non-int data
    3. Change app does not use prepare statement
    4. Binlog format is formatted as row

Note: This issue has been reported to MARIADB and is identified as a bug that will be fixed in subsequent releases.

MySQL multi-byte character set causes master-slave data inconsistency problem

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.