The difference between proofreading set utf8_unicode_ci and utf8_general

The difference between proofreading set utf8_unicode_ci and utf8_general_ci in MySQL

Last Update:2014-08-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

has been to utf8_unicode_ci and utf8_general_ci these 2 proofing set very confused, today checked the manual has a point. But what is the difference between using UTF8_UNICODE_CI and Utf8_general_ci for the Chinese character set, or unclear?

The following is an excerpt from the MySQL 5.1 Chinese manual on Utf8_unicode_ci and Utf8_general_ci:

Currently, the UTF8_UNICODE_CI proofing rules only partially support the Unicode collation rule algorithm. Some characters are still not supported. Also, combinations of tokens cannot be fully supported. This mainly affects some minority languages of Vietnam and Russia, such as Udmurt, Tatar, Bashkir and Mari.

The main feature of UTF8_UNICODE_CI is the support for expansion, i.e. when one letter is considered to be equal to the other letter combinations. For example, in German and some other languages ' ß ' Equals ' SS '.

Utf8_general_ci is a legacy proofing rule and does not support extensions. It can only be compared between characters. This means that the UTF8_GENERAL_CI proofing rules are relatively fast, but less accurate than the proofing rules that use UTF8_UNICODE_CI.

For example, compare equality using the following UTF8_GENERAL_CI and utf8_unicode_ci two proofing rules:

Ä = A

Ö = O

Ü = U

The difference between the two proofing rules is that for utf8_general_ci the following equation is true:

ß = S

However, for utf8_unicode_ci the following equation is established:

ß = SS

For a language, the UTF8 character set collation rules related to the specific language are performed only if the use of utf8_unicode_ci sorting is not good. For example, for German and French, Utf8_unicode_ci works very well, so you no longer need to create special UTF8 proofing rules for both languages.

Utf8_general_ci is also available in German and French, except ' ß ' equals ' s ', not ' SS '. If your app can accept these, then you should use UTF8_GENERAL_CI because it's fast. Otherwise, use utf8_unicode_ci because it is more accurate.

If you want to use gb2312 encoding, then it is recommended that you use Latin1 as the default character set for the datasheet so that you can insert data directly into the command-line tool in Chinese, and it can be displayed directly. instead of using character sets such as gb2312 or GBK, and if you're worried about sorting problems like queries, You can use binary attribute constraints, for example:

CREATE TABLE my_table (name varchar () binary NOT NULL default ') Type=myisam default CharSet latin1;

Attachment 1: Old data upgrade method

As an example of the original character set, Latin1 is promoted to be the UTF8 character set. Original table: old_table (Default charset=latin1), new table: new_table (Default Charset=utf8).

First step: Export old data

Mysqldump--default-character-set=latin1-hlocalhost-uroot-b my_db--tables old_table > Old.sql

Step Two: Conversion coding (similar to Unix/linux environment)

Iconv-t Utf-8-F gb2312-c old.sql > New.sql

Or you can remove the-f parameter to let Iconv automatically determine the original character set

Iconv-t Utf-8-C old.sql > New.sql

Here, it is assumed that the original data is gb2312 encoded by default.

Step Three: Import

Modify Old.sql to add an SQL statement before the INSERT/UPDATE statement begins: "SET NAMES UTF8;", save.

Mysql-hlocalhost-uroot my_db < New.sql

Done!!

Attached 2: The MySQL client that supports viewing the UTF8 character set has

1.) Mysql-front, it is said that this project has been suspended by MySQL AB, I do not know why, if there are many cracked version can be downloaded (does not mean I recommend the use of cracked version:-P).

2.) Navicat, another very good MySQL client, the Chinese version has just come out, also invited me to try, overall is good, but also need to pay.

3.) PhpMyAdmin, open source PHP project, very good.

4.) Linux terminal tools (Linux terminal), the terminal's character set to UTF8, after connecting to MySQL, execute set NAMES UTF8; can also read and write UTF8 data.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The difference between proofreading set utf8_unicode_ci and utf8_general_ci in MySQL

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The difference between proofreading set utf8_unicode_ci and utf8_general_ci in MySQL

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support