Collation rules in MySQL UTF8

Last Update:2017-02-28 Source: Internet

Author: User

Tags character set mysql client

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

What is the difference between utf8_general_ci and utf8_unicode_ci in mysql? In the programming language, usually Unicode to the Chinese character to do processing, to prevent garbled, then in MySQL, why do we all use utf8_general_ci instead of utf8_unicode_ci?

First, the official documentation note the following excerpt from the MySQL 5.1 Chinese manual about Utf8_unicode_ci and utf8_general_ci:     Code as follows:       Current, utf8_ The Unicode_ci proofing rules only partially support the Unicode collation algorithm. Some characters are still not supported. Also, the combination of tokens cannot be fully supported. This mainly affects some minority languages in Vietnam and Russia, such as: Udmurt, Tatar, Bashkir and Mari. The main feature of       &NBSP;UTF8_UNICODE_CI is support for extensions, which is when you think of a letter as equal to another letter combination. For example, in German and some other languages ' ß ' equals ' ss '.       &NBSP;UTF8_GENERAL_CI is a legacy proofing rule that does not support scaling. It can only be compared between characters. This means that the UTF8_GENERAL_CI proofing rules are relatively fast, but are less accurate than the collation rules used for UTF8_UNICODE_CI.         For example, the comparison of the following two proofing rules using UTF8_GENERAL_CI and Utf8_unicode_ci is equal:      ä= A     &NB Sp ö= O      ü= U         The difference between the two proofing rules is that for utf8_general_ci the following equation is established:      ß= S         However, for utf8_unicode_ci the following equation is set up:      ß= SS         for a language only when using U UTF8 character set proofing rules related to specific languages are executed when tf8_unicode_ci sorting is not done well. For example, UTF8_UNICODE_CI works well for German and French, so you no longer need to create special UTF8 proofing rules for both languages.       &NBSP;UTF8_GENERAL_CI also applies with German and French, exceptThe ' ß ' equals ' s ', not ' SS '. If your application can accept these, then you should use UTF8_GENERAL_CI because it is fast. Otherwise, use utf8_unicode_ci because it is more accurate.     If you want to use the gb2312 encoding, it is recommended that you use Latin1 as the default character set for the datasheet so that you can insert the data directly in the command line tool in Chinese and display it directly. Instead of using character sets such as gb2312 or GBK, If you are concerned about queries such as sorting, you can use the Binary property constraint, for example: The following code: CREATE TABLE my_table (name varchar () binary NOT NULL default ') Type=myisam Defaul T CharSet latin1;   II, a brief summary of Utf8_unicode_ci and utf8_general_ci, there is no substantial difference between Chinese and English. Utf8_general_ci proofreading speed, but the accuracy is slightly poor. Utf8_unicode_ci accuracy is high, but the proofreading speed is slightly slow.   If your application has German, French or Russian, please be sure to use UTF8_UNICODE_CI. General use Utf8_general_ci is enough, until now also found no problem ...   III, detailed summary   1, for a language only when the use of utf8_unicode_ci sorting does not do well, the implementation of specific language-related UTF8 character set proofing rules. For example, UTF8_UNICODE_CI works well for German and French, so you no longer need to create special UTF8 proofing rules for both languages. 2, Utf8_general_ci also applies with German and French, except '? ' equals ' s ', not ' SS '. If your application can accept these, then you should use UTF8_GENERAL_CI because it is fast. Otherwise, use utf8_unicode_ci because it is more accurate.   with a word overview above this paragraph: utf8_unicode_ci more accurate, utf8_general_ci speed is relatively fast. Usually the accuracy of utf8_general_ci is enough for us to use, after I read a lot of program source, found that most of them are also used utf8_general_ci, so the general selection of new database utf8_general_ci can be a   IV, How to use UTF8 in MySQL5.0 to increase in my.cnfAdd the following parameters     code as follows: [mysqld] init_connect= ' SET NAMES utf8′default-character-set=utf8 default-collation = utf8_ GENERAL_CI   Executive inquiries mysql> show variables; Relevant:      codes are as follows: Character_set_client | utf8  Character_set_connection | utf8  Character_set_database | utf8  Character_set_results | utf8  Character_set_server | utf8  Character_set_system | utf8    Collation_connection | utf8_general_ci  Collation_database | utf8_general_ci  Collation_server | Utf8_general_ci     Personal opinion, for the use of the database, Utf8-general has been enough accurate, and compared with the  utf8-unicode speed advantage, solid can be assured that the adoption of     Appendix 1: Old data upgrade method to the original character set for Latin1 as an example, upgraded to become UTF8 's character set. Original table: old_table (Default charset=latin1), new table: new_table (Default Charset=utf8). Step one: Export old data     copy code code as follows: Mysqldump--default-character-set=latin1-hlocalhost-uroot-b my_db--tables old_table > Old.sql The second step: convert code (similar to Unix/linux environment) codes as follows: Iconv-t utf-8-F gb2312-c old.sql > New.sql or you can remove the-f parameter so that Iconv automatically judge the originalThe character set   code is as follows: Iconv-t utf-8-C old.sql > New.sql here, the original data is assumed to be gb2312 encoded by default. Step Three: Import Modify Old.sql, add an SQL statement before the INSERT/UPDATE statement: "SET NAMES UTF8;", save.   Code as follows: Mysql-hlocalhost-uroot my_db < New.sql is done!!   2: MySQL clients that support viewing the UTF8 character set have 1. Mysql-front, it is said that this project has been the MySQL AB stopped, I do not know why, if there are many cracked version can be downloaded (does not mean that I recommend the use of cracked version:-P). 2.) Navicat, another very good MySQL client, the Chinese version just came out, but also invited me to try, overall still good, but also need to pay. 3. phpMyAdmin, open source PHP project, very good. 4.) Linux under the Terminal tool (Linux terminal), the end of the character set to UTF8, connected to MySQL, the implementation of set NAMES UTF8; can also read and write UTF8 data.  

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More