Understanding of character set and collation in MySQL

Source: Internet
Author: User

When using MySQL to create data tables, it is unavoidable to involve the concept of character set and collation, which is not well understood before.

Let's take a look at character set and collation.

Character Set, which is the character set.

We often see the utf-8, GB2312, GB18030 are independent of each other character set. That is, a set of encodings for Unicode.

So how to understand the difference between Unicode and Utf-8, GB2312?
For example, there is an apple in front of you, called Apple in English, and Apple in Chinese.
The concept of the Apple entity is Unicode, and utf-8, GB2312 can be thought of as different languages for Apple, in essence, is to describe Apple this thing.

collation, that is, the method of comparison.

Used to specify how the dataset is sorted, and the alignment rules for strings. (This may be more abstract and will be explained in more detail later.) )

the relationship between character set and collation

Software internationalization is the trend, so Unicode is the best choice for internationalization. Of course to improve performance, in some cases it is better to use latin1.

MySQL has two Unicode-enabled character set:

1. UCS2: Use the bits to represent a Unicode character.

2. UTF8: Use the bytes to represent a Unicode character.

Select which character set depends on the case, such as UTF8 means that the Latin character requires only one byte, so when the user data is mostly English and other Latin characters, use UTF8 to save the database storage space. It is said that SQL Server is using UCS2, which I doubt.

Each character set corresponds to a certain number of collation. The view method is to enter it under the console of MySQL:

Java code
    1. Mysql> Show collation;

We will see the result:

The rules of collation names can be summed up in these two categories:

1. <character set>_<language/other>_<ci/cs>

2. <character Set>_bin

For example:

Utf8_danish_ci

CI is the abbreviation for case insensitive, and CS is the abbreviation for case sensitive. That is, specifies whether the case is sensitive.

It is strange that the UTF8 character set corresponds to the collation actually no one is CS.

So what's the difference between Utf8_general_ci, Utf8_unicode_ci, utf8_danish_ci? What is the meaning of their respective existence?

The difference between the different collation of the same character set is the accuracy of the sort, word Fu Chun contrast (the same two characters may be different in different countries ' languages) and performance.

For example:

Utf8_general_ci in the accuracy of the ranking is inferior to utf8_unicode_ci, of course, for English users should be no different. But the performance (sorting and alignment speed) is slightly better than utf8_unicode_ci. For example, the former does not speak to German

? = SS

The support.

In contrast to Utf8_unicode_ci, Utf8_danish_ci added special sort support for Danish.

Add:

1. When the character set of a table is latin1, the character set of the field automatically changes to UTF8 if the field type is nvarchar.

The database character set, table character set, field character set can be overridden progressively, similar to the object-oriented subclass inheriting the parent class, overriding the parent class method.

2. Under CI collation, how to match the case when compared:

Mysql> SELECT * from pet;+----------+-------+---------+------+------------+-------+| name | Owner | Species | sex | Birth | Death |+----------+-------+---------+------+------------+-------+| Whistler | Gwen | Bird | NULL | 1997-12-09 | NULL | | Whistler | Gwen | Bird | NULL | 1988-09-25 |  NULL |+----------+-------+---------+------+------------+-------+2 rows in Set (0.00 sec) mysql> select * from pet where name = ' Whistler '; +----------+-------+---------+------+------------+-------+| name | Owner | Species | sex | Birth | Death |+----------+-------+---------+------+------------+-------+| Whistler | Gwen | Bird | NULL | 1997-12-09 | NULL | | Whistler | Gwen | Bird | NULL | 1988-09-25 |  NULL |+----------+-------+---------+------+------------+-------+2 rows in Set (0.00 sec) mysql> select * from pet where Binary name = ' Whistler ', +----------+-------+---------+------+------------+-------+| name | Owner | Species | sex | Birth | Death |+----------+-------+---------+------+------------+ +------+| Whistler | Gwen | Bird | NULL | 1988-09-25 | NULL |+----------+-------+---------+------+------------+-------+1 row in Set (0.00 sec) mysql> select * from pet where n ame = binary ' Whistler ', +----------+-------+---------+------+------------+-------+| name | Owner | Species | sex | Birth | Death |+----------+-------+---------+------+------------+-------+| Whistler | Gwen | Bird | NULL | 1988-09-25 | NULL |+----------+-------+---------+------+------------+-------+1 row in Set (0.00 sec)

Recommended Use

Mysql> SELECT * from pet where name = binary ' Whistler ';

This ensures that the current field's index is still valid and

Mysql> SELECT * from pet where binary name = ' Whistler ';

Will invalidate the index.

Original link: http://www.360doc.com/content/11/0303/01/2588264_97631236.shtml

Reference list:

1. The "The best collation" to "use for MySQL" with PHP.

http://stackoverflow.com/questions/367711/what-is-the-best-collation-to-use-for-mysql-with-php

2. Unicode Character Sets

Http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html

3. Show Collation Syntax

Http://dev.mysql.com/doc/refman/5.1/en/show-collation.html

4. The Binary Operator

Http://dev.mysql.com/doc/refman/5.1/en/charset-binary-op.html

Understanding of character set and collation in MySQL

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.