The difference between proofreading set utf8_unicode_ci and utf8_general_ci in MySQL

Source: Internet
Author: User

has been to utf8_unicode_ci and utf8_general_ci these 2 proofing set very confused, today checked the manual has a point. But what is the difference between using UTF8_UNICODE_CI and Utf8_general_ci for the Chinese character set, or unclear?

The following is an excerpt from the MySQL 5.1 Chinese manual on Utf8_unicode_ci and Utf8_general_ci:

Currently, the UTF8_UNICODE_CI proofing rules only partially support the Unicode collation rule algorithm. Some characters are still not supported. Also, combinations of tokens cannot be fully supported. This mainly affects some minority languages of Vietnam and Russia, such as Udmurt, Tatar, Bashkir and Mari.

The main feature of UTF8_UNICODE_CI is the support for expansion, i.e. when one letter is considered to be equal to the other letter combinations. For example, in German and some other languages ' ß ' Equals ' SS '.

Utf8_general_ci is a legacy proofing rule and does not support extensions. It can only be compared between characters. This means that the UTF8_GENERAL_CI proofing rules are relatively fast, but less accurate than the proofing rules that use UTF8_UNICODE_CI.

For example, compare equality using the following UTF8_GENERAL_CI and utf8_unicode_ci two proofing rules:

Ä = A

Ö = O

Ü = U

The difference between the two proofing rules is that for utf8_general_ci the following equation is true:

ß = S

However, for utf8_unicode_ci the following equation is established:

ß = SS

For a language, the UTF8 character set collation rules related to the specific language are performed only if the use of utf8_unicode_ci sorting is not good. For example, for German and French, Utf8_unicode_ci works very well, so you no longer need to create special UTF8 proofing rules for both languages.

Utf8_general_ci is also available in German and French, except ' ß ' equals ' s ', not ' SS '. If your app can accept these, then you should use UTF8_GENERAL_CI because it's fast. Otherwise, use utf8_unicode_ci because it is more accurate.

If you want to use gb2312 encoding, then it is recommended that you use Latin1 as the default character set for the datasheet so that you can insert data directly into the command-line tool in Chinese, and it can be displayed directly. instead of using character sets such as gb2312 or GBK, and if you're worried about sorting problems like queries, You can use binary attribute constraints, for example:

CREATE TABLE my_table (name varchar () binary NOT NULL default ') Type=myisam default CharSet latin1;

Attachment 1: Old data upgrade method

As an example of the original character set, Latin1 is promoted to be the UTF8 character set. Original table: old_table (Default charset=latin1), new table: new_table (Default Charset=utf8).

First step: Export old data

Mysqldump--default-character-set=latin1-hlocalhost-uroot-b my_db--tables old_table > Old.sql

Step Two: Conversion coding (similar to Unix/linux environment)

Iconv-t Utf-8-F gb2312-c old.sql > New.sql

Or you can remove the-f parameter to let Iconv automatically determine the original character set

Iconv-t Utf-8-C old.sql > New.sql

Here, it is assumed that the original data is gb2312 encoded by default.

Step Three: Import

Modify Old.sql to add an SQL statement before the INSERT/UPDATE statement begins: "SET NAMES UTF8;", save.

Mysql-hlocalhost-uroot my_db < New.sql

Done!!

Attached 2: The MySQL client that supports viewing the UTF8 character set has

1.) Mysql-front, it is said that this project has been suspended by MySQL AB, I do not know why, if there are many cracked version can be downloaded (does not mean I recommend the use of cracked version:-P).

2.) Navicat, another very good MySQL client, the Chinese version has just come out, also invited me to try, overall is good, but also need to pay.

3.) PhpMyAdmin, open source PHP project, very good.

4.) Linux terminal tools (Linux terminal), the terminal's character set to UTF8, after connecting to MySQL, execute set NAMES UTF8; can also read and write UTF8 data.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.