Analysis of problems caused by MySQL proofreading rules

Source: Internet
Author: User

The problem is as follows:
A test table. The character set uses latin1.
Select to_id from test where to_id = 'cn elephant _ Wang ';
+ --------------- +
| To_id |
+ --------------- +
| CN Tao _ Tao |
| CN xiang_wang |
+ --------------- +
2 rows in SET (0.00 Sec)

The data of CN elephant King was retrieved.

This is obviously not allowed.

view their encoding:
(root @ im_offlog1a :) [test]> select hex ('cn Tao _ tao ');
+ ---------------- +
| hex ('cn Tao _ tao ') |
+ -------------- +
| 636eccd55fccd5 |
+ ---------------- +
1 row in SET (0.00 Sec)
(root @ im_offlog1a :) [test]> select hex ('cn elephant _ Wang ');
+ ---------------- +
| hex ('cn elephant _ Wang ') |
+ -------------- +
| 636ecff35fcdf5 |
+ ---------------- +
1 row in SET (0.00 Sec)
Encoding is indeed different, but why does MySQL think these two records are the same?
at the beginning, we located the problem caused by collation.
show variables view
| collation_connection | latin1_swedish_ci
| collation_database | latin1_swedish_ci
| collation_server | latin1_swedish_ci

Manually change these parameters to latin1_bin, and the result is the same. This is really strange.
Here we will first explain the naming rules for MySQL collation:
They start with their relevant Character Set names, usually include a language name, and end with _ Ci (Case Insensitive), _ CS (case sensitive), or _ Bin (Binary)
For example, the Latin1 character set has the following correction rules:
Proofread Rule Meaning
Latin1_german1_ci Germany din-1
Latin1_swedish_ci Sweden/Finland
Latin1_danish_ci Denmark/Norway
Latin1_german2_ci Germany din-2
Latin1_bin complies with Latin1 encoding binary
Latin1_general_ci multi-language (Western Europe)
Latin1_general_cs multi-language (Western Europe ISO), case sensitive
Latin?spanish_ci modern Spain

Finally, we recreate the table and manually specify the table-level collation as latin1_bin.
This problem is solved.

The problem arises again. Why does it not take effect when I manually test latinrelease bin?
MySQL used to select the table Character Set and proofreading rules as follows:
If character set X and collate y are specified, character set X and collate y are used.
If the character set X is specified but the Collate y is not specified, the character set X and character set X are used as the default proofreading rules.
Otherwise, the server Character Set and server verification rules are used.
The character set is specified during table creation, so it always adopts the corresponding default proofreading rules.

Of course, we do not need to recreate the table. We only need to alter table db_allot convert to Character Set Latin1 collate latin1_bin to convert the table.

In addition, we recommend that collation use the bin-type proofreading rules corresponding to the character set to avoid errors.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.