has been to utf8_unicode_ci and utf8_general_ci these 2 proofing set very confused, today checked the manual has a point. But what is the difference between using UTF8_UNICODE_CI and Utf8_general_ci for the Chinese character set, or unclear?
The following is an excerpt from the MySQL 5.1 Chinese manual on Utf8_unicode_ci and Utf8_general_ci:
Currently, the UTF8_UNICODE_CI proofing rules only partially support the Unicode collation rule algorithm. Some characters are still not supported. Also, combinations of tokens cannot be fully supported. This mainly affects some minority languages of Vietnam and Russia, such as Udmurt, Tatar, Bashkir and Mari.
The main feature of UTF8_UNICODE_CI is the support for expansion, i.e. when one letter is considered to be equal to the other letter combinations. For example, in German and some other languages ' ß ' Equals ' SS '.
Utf8_general_ci is a legacy proofing rule and does not support extensions. It can only be compared between characters. This means that the UTF8_GENERAL_CI proofing rules are relatively fast, but less accurate than the proofing rules that use UTF8_UNICODE_CI.
For example, compare equality using the following UTF8_GENERAL_CI and utf8_unicode_ci two proofing rules:
Ä = A
Ö = O
Ü = U
The difference between the two proofing rules is that for utf8_general_ci the following equation is true:
ß = S
However, for utf8_unicode_ci the following equation is established:
ß = SS
For a language, the UTF8 character set collation rules related to the specific language are performed only if the use of utf8_unicode_ci sorting is not good. For example, for German and French, Utf8_unicode_ci works very well, so you no longer need to create special UTF8 proofing rules for both languages.
Utf8_general_ci is also available in German and French, except ' ß ' equals ' s ', not ' SS '. If your app can accept these, then you should use UTF8_GENERAL_CI because it's fast. Otherwise, use utf8_unicode_ci because it is more accurate.
If you want to use gb2312 encoding, then it is recommended that you use Latin1 as the default character set for the datasheet so that you can insert data directly into the command-line tool in Chinese, and it can be displayed directly. instead of using character sets such as gb2312 or GBK, and if you're worried about sorting problems like queries, You can use binary attribute constraints, for example:
CREATE TABLE my_table (name varchar () binary NOT NULL default ') Type=myisam default CharSet latin1;
Attachment 1: Old data upgrade method
As an example of the original character set, Latin1 is promoted to be the UTF8 character set. Original table: old_table (Default charset=latin1), new table: new_table (Default Charset=utf8).
First step: Export old data
Mysqldump--default-character-set=latin1-hlocalhost-uroot-b my_db--tables old_table > Old.sql
Step Two: Conversion coding (similar to Unix/linux environment)
Iconv-t Utf-8-F gb2312-c old.sql > New.sql
Or you can remove the-f parameter to let Iconv automatically determine the original character set
Iconv-t Utf-8-C old.sql > New.sql
Here, it is assumed that the original data is gb2312 encoded by default.
Step Three: Import
Modify Old.sql to add an SQL statement before the INSERT/UPDATE statement begins: "SET NAMES UTF8;", save.
Mysql-hlocalhost-uroot my_db < New.sql
Done!!
Attached 2: The MySQL client that supports viewing the UTF8 character set has
1.) Mysql-front, it is said that this project has been suspended by MySQL AB, I do not know why, if there are many cracked version can be downloaded (does not mean I recommend the use of cracked version:-P).
2.) Navicat, another very good MySQL client, the Chinese version has just come out, also invited me to try, overall is good, but also need to pay.
3.) PhpMyAdmin, open source PHP project, very good.
4.) Linux terminal tools (Linux terminal), the terminal's character set to UTF8, after connecting to MySQL, execute set NAMES UTF8; can also read and write UTF8 data.