MySQL collation method

Source: Internet
Author: User

The problem is as follows:
A test table. The character set uses latin1.

Select to_id from test where to_id = 'cn elephant _ Wang ';
+ --------------- +
| To_id |
+ --------------- +
| Cn Tao _ Tao |
| Cn xiang_wang |
+ --------------- +
2 rows in set (0.00 sec)

The data of cn elephant King was retrieved.
This is obviously not allowed.

View their encoding:

(Root @ im_offlog1a) [test]> select hex ('cn Tao _ tao ');
+ ---------------- +
| Hex ('cn Tao _ tao') |
+ ---------------- +
| 636ECCD55FCCD5 |
+ ---------------- +
1 row in set (0.00 sec)

(Root @ im_offlog1a) [test]> select hex ('cn elephant _ Wang ');
+ ---------------- +
| Hex ('cn elephant _ Wang ') |
+ ---------------- +
| 636ECFF35FCDF5 |
+ ---------------- +
1 row in set (0.00 sec)

The encoding is indeed different, but why does mysql think the two records are the same?

At the beginning, we located the problem caused by collation.

View show variables
| Collation_connection | latin1_swedish_ci
| Collation_database | latin1_swedish_ci
| Collation_server | latin1_swedish_ci


Manually change these parameters to latin1_bin, and the result is the same. This is really strange.

Here we will first explain the naming rules for mysql collation:
They start with their relevant Character Set names, usually include a language name, and end with _ ci (Case Insensitive), _ cs (case sensitive), or _ bin (Binary)

For example, the latin1 character set has the following correction rules:

Proofread Rule Meaning
Latin1_german1_ci Germany DIN-1
Latin1_swedish_ci Sweden/Finland
Latin1_danish_ci Denmark/Norway
Latin1_german2_ci Germany DIN-2
Latin1_bin complies with latin1 encoding binary
Latin1_general_ci multi-language (Western Europe)
Latin1_general_cs multi-language (Western Europe ISO), case sensitive
Latin?spanish_ci modern Spain


Finally, we recreate the table and manually specify the table-level collation as latin1_bin.
This problem is solved.


The problem arises again. Why does it not take effect when I manually test latinrelease bin?

MySQL used to select the table Character Set and proofreading rules as follows:
If character set x and collate y are specified, character set x and collate y are used.
If the character set x is specified but the collate y is not specified, the character set x and character set x are used as the default proofreading rules.
Otherwise, the server Character Set and server verification rules are used.

The character set is specified during table creation, so it always adopts the corresponding default proofreading rules.

Of course, we do not need TO recreate the table. We only need TO alter table db_allot convert to character set latin1 COLLATE latin1_bin to convert the table.

In addition, we recommend that collation use the bin-type checking rules of the character set as much as possible to avoid errors.


Let's talk about my own experiences.

I think character set latin1 collate latin1_bin is an improvement of the old version of VARCHAR BINARY, but the new version uses character set to set the character set first, and then uses the character set name to add _ bin to set the checking rule to BINARY, this ensures that the Chinese Query is correct.
I tested it again and changed the attribute of this field without BINARY.
Alter table 'comment _ content_000001 'change 'thread' VARCHAR (50) DEFAULT NULL
Then we can see that the table structure is indeed changed to 'thread' varchar (50) default NULL, that is, without the character set latin1 collate latin1_bin. It can be seen that character set latin1 collate latin1_bin is an improved version of varch.

In addition, I also read more convenient practices. I don't need to modify the field attributes one by one, as long as the table-level collation is latin1_bin.
Test:
Alter table comment_content_000001 convert to character set latin1 COLLATE latin1_bin
After,

Export the table structure


Create table comment_content_000001 (
Content_id int (11) not null auto_increment,
Thread varchar (50) collate latin1_bin default NULL,
Uname varchar (100) collate latin1_bin default NULL,
Nick varchar (100) collate latin1_bin default NULL,
Uid int (11) unsigned default NULL,
Content text collate latin1_bin,
Post_time datetime default NULL,
Post_ip int (10) unsigned default NULL,
'Status' enum ('unaudit', 'normal', 'deleted') collate latin1_bin not null default 'unaudit ',
Primary key (content_id)
) ENGINE = MyISAM default charset = latin1 COLLATE = latin1_bin;

Even if the original collate of each field is not set, it is all collate latin1_bin.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.