Differences between utf8_unicode_ci and utf8_general_ci in Mysql _ MySQL

Source: Internet
Author: User
Summary of differences between utf8_unicode_ci and utf8_general_ci in Mysql 1. official documents
The following excerpt describes utf8_unicode_ci and utf8_general_ci in the Mysql 5.1 Chinese manual:

Currently, utf8_unicode_ci rules only support Unicode rules. Some characters are not supported. In addition, the combination of tags cannot be fully supported. This mainly affects some ethnic languages in Vietnam and Russia, such as Udmurt, Tatar, Bashkir, and Mari.

The main feature of utf8_unicode_ci is the support for expansion, that is, when a letter is considered equal to a combination of other letters. For example, in German and some other languages, 'Taobao' is equal to 'SS '.

Utf8_general_ci is a legacy verification rule and does not support expansion. It can only compare characters one by one. This means that the utf8_general_ci verification rules are relatively fast, but they are more accurate than those using utf8_unicode_ci ).

For example, the comparison under the utf8_general_ci and utf8_unicode_ci verification rules is equal:
Ä=
Ö= O
U = U

The difference between the two verification rules is that the equation under utf8_general_ci is true:
Bytes = s

However, the equation for utf8_unicode_ci is true:
Token = ss

If utf8_unicode_ci sorting is poor for a language, the utf8 character set proofreading rules related to the specific language are executed. For example, utf8_unicode_ci works well in German and French, so you do not need to create special utf8 verification rules for these two languages.

Utf8_general_ci is also applicable to German and French, except for 'SS' rather than 'SS. If your application can accept this, you should use utf8_general_ci because it is fast. Otherwise, use utf8_unicode_ci because it is accurate.

If you want to use gb2312 encoding, we recommend that you use latin1 as the default character set of the data table so that you can insert data directly in the command line tool in Chinese and display it directly. do not use the gb2312 or gbk character sets. if you are worried about query sorting and other issues, you can use the binary attribute constraints, such:
Create table my_table (name varchar (20) binary not null default '') type = myisam default charset latin1;

II. Brief summary
Utf8_unicode_ci and utf8_general_ci have no substantial differences in Chinese and English.
Utf8_general_ci is fast but slightly less accurate.
Utf8_unicode_ci is accurate, but the speed of calibration is relatively slow.

If your applications include German, French, or Russian, use utf8_unicode_ci. Generally, utf8_general_ci is enough, and no problem has been found yet...

III. Summary

1. if utf8_unicode_ci sorting is poor for a language, the utf8 character set proofreading rules related to the specific language are executed. For example, utf8_unicode_ci works well in German and French, so you do not need to create special utf8 verification rules for these two languages.
2. utf8_general_ci is also applicable to German and French languages '? 'S', not 'SS. If your application can accept this, you should use utf8_general_ci because it is fast. Otherwise, use utf8_unicode_ci because it is accurate.

In the above section, utf8_unicode_ci is more accurate and utf8_general_ci is faster. Generally, the accuracy of utf8_general_ci is enough for us. after reading many program source codes, I find that most of them also use utf8_general_ci, therefore, you can use utf8_general_ci to create a data library.

4. how to use UTF8 in MySQL5.0
Add the following parameters to my. cnf:

[Mysqld]
Init_connect = 'set NAMES utf8 ′
Default-character-set = utf8
Default-collation = utf8_general_ci

Run the following command to query mysql> show variables:
Character_set_client | utf8
Character_set_connection | utf8
Character_set_database | utf8
Character_set_results | utf8
Character_set_server | utf8
Character_set_system | utf8

Collation_connection | utf8_general_ci
Collation_database | utf8_general_ci
Collation_server | utf8_general_ci

In my opinion, utf8-general is accurate enough for database use, and it has advantages over utf8-unicode speed, so you can use it with confidence.


Appendix 1: Methods for upgrading old data
Take the original character set latin1 as an example to upgrade it to the utf8 character set. Original Table: old_table (default charset = latin1), new table: new_table (default charset = utf8 ).
Step 1: Export old data
Mysqldump -- default-character-set = latin1-hlocalhost-uroot-B my_db -- tables old_table> old. SQL
Step 2: convert the encoding (similar to the unix/linux environment)
Iconv-t UTF-8-f gb2312-c old. SQL> new. SQL
You can also remove the-f parameter to enable iconv to automatically determine the original character set.
Iconv-t UTF-8-c old. SQL> new. SQL
Here, we assume that the original data is gb2312 by default.
Step 3: Import
Modify the old. SQL statement and add an SQL statement "SET NAMES utf8;" before the insert/update statement starts. Save the statement.
Mysql-hlocalhost-uroot my_db <new. SQL
Success !!

Appendix 2: MySQL clients that support viewing utf8 character sets have
1.) MySQL-Front, it is said that this project has been ordered by MySQL AB to stop. for some reason, if there are many cracked versions in China, you can download it (it does not mean I recommend using the cracked version:-P ).
2.) Navicat, another very good MySQL client, just came out of the Chinese version and invited me to try it out. In general, it is still good, but it also needs to be paid.
3.) PhpMyAdmin, an open-source php project, is very good.
4) Linux terminal tool (Linux terminal), SET the character SET of the terminal to utf8, connect to MySQL, execute set names UTF8; can also read and write utf8 data.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.