The difference between utf8_general_ci, Utf8_general_cs and Utf8_bin

Source: Internet
Author: User

After such a long time, found that they do not know Utf_bin and utf_general_ci the difference between the two exactly.
CI is case insensitive, that is, "casing is not sensitive", a and a will be in the character judgment will be treated as the same;
The bin is binary, and a and a are treated differently.
For example you run:
SELECT * FROM table WHERE txt = ' a '
Then in Utf8_bin you can't find the line of txt = ' a ', and utf8_general_ci can.
Utf8_general_ci is not case-sensitive, you should use this when registering your username and mailbox.
Utf8_general_cs case-sensitive, if the user name and mailbox Use this will be bad consequences
Utf8_bin: String Each string is compiled with binary data stored. Case-sensitive and binary content can be stored

I. Official document Description
The following is an excerpt from the MySQL 5.1 Chinese manual on Utf8_unicode_ci and Utf8_general_ci:

Copy CodeThe code is as follows:
Currently, the UTF8_UNICODE_CI proofing rules only partially support the Unicode collation rule algorithm. Some characters are still not supported. Also, combinations of tokens cannot be fully supported. This mainly affects some minority languages of Vietnam and Russia, such as Udmurt, Tatar, Bashkir and Mari.

The main feature of UTF8_UNICODE_CI is the support for expansion, i.e. when one letter is considered to be equal to the other letter combinations. For example, in German and some other languages ' ß ' equals ' ss '.

Utf8_general_ci is a legacy proofing rule and does not support extensions. It can only be compared between characters. This means that the UTF8_GENERAL_CI proofing rules are relatively fast, but less accurate than the proofing rules that use UTF8_UNICODE_CI.

For example, compare equality using the following UTF8_GENERAL_CI and utf8_unicode_ci two proofing rules:
ä= A
ö= O
ü= U

The difference between the two proofing rules is that for utf8_general_ci the following equation is true:
ß= s

However, for utf8_unicode_ci the following equation is established:
ß= SS

For a language, the UTF8 character set collation rules related to the specific language are performed only if the use of utf8_unicode_ci sorting is not good. For example, for German and French, Utf8_unicode_ci works very well, so you no longer need to create special UTF8 proofing rules for both languages.

Utf8_general_ci also works with German and French, except ' ß ' equals ' s ', not ' SS '. If your app can accept these, then you should use UTF8_GENERAL_CI because it's fast. Otherwise, use utf8_unicode_ci because it is more accurate.


If you want to use gb2312 encoding, then it is recommended that you use Latin1 as the default character set for the datasheet so that you can insert data directly into the command-line tool in Chinese, and it can be displayed directly. instead of using character sets such as gb2312 or GBK, and if you're worried about sorting problems like queries, You can use binary attribute constraints, for example:

Copy CodeThe code is as follows: Create TABLE my_table (name varchar () binary NOT NULL default ') Type=myisam default CharSet latin1;

Ii. Brief summary
Utf8_unicode_ci and Utf8_general_ci have no real difference between Chinese and English.
Utf8_general_ci proofreading speed is fast, but the accuracy is slightly worse.
Utf8_unicode_ci accuracy is high, but the proofreading speed is slightly slower.

If your application is German, French, or Russian, be sure to use UTF8_UNICODE_CI. The general use of utf8_general_ci is enough, until now also found no problem ...

Three, detailed summary

1, for a language only when the use of utf8_unicode_ci sorting is not good, only to perform the specific language-related UTF8 character set collation rules. For example, for German and French, Utf8_unicode_ci works very well, so you no longer need to create special UTF8 proofing rules for both languages.
2. Utf8_general_ci is also available in German and French, except '? ' equals ' s ', not ' SS '. If your app can accept these, then you should use UTF8_GENERAL_CI because it's fast. Otherwise, use utf8_unicode_ci because it is more accurate.

In a word overview above this paragraph: utf8_unicode_ci more accurate, utf8_general_ci speed is relatively fast. Usually the accuracy of utf8_general_ci is enough for us to use, after I have seen a lot of program source code, found that most of them also use is utf8_general_ci, so the new database is generally selected UTF8_GENERAL_CI can be

Iv. How to use UTF8 in MySQL5.0
Add the following parameters to the MY.CNF

Copy CodeThe code is as follows:
[Mysqld]
init_connect= ' SET NAMES utf8′
Default-character-set=utf8
Default-collation = Utf8_general_ci

Execute query mysql> show variables; Related to the following:

Copy CodeThe code is as follows: Character_set_client | Utf8
character_set_connection | Utf8
Character_set_database | Utf8
Character_set_results | Utf8
Character_set_server | Utf8
Character_set_system | Utf8

collation_connection | Utf8_general_ci
Collation_database | Utf8_general_ci
Collation_server | Utf8_general_ci

Personal opinion, for the use of the database, Utf8-general is accurate enough, and compared with the Utf8-unicode speed advantage, solid can be assured to use the


Attachment 1: Old data upgrade method
As an example of the original character set, Latin1 is promoted to be the UTF8 character set. Original table: old_table (Default charset=latin1), new table: new_table (Default Charset=utf8).
First step: Export old data

Copy CodeThe code is as follows: mysqldump--default-character-set=latin1-hlocalhost-uroot-b my_db--tables old_table > Old.sql


Step Two: Conversion coding (similar to Unix/linux environment)

Copy CodeThe code is as follows: Iconv-t utf-8-F gb2312-c old.sql > New.sql


Or you can remove the-f parameter to let Iconv automatically determine the original character set

Copy CodeThe code is as follows: Iconv-t utf-8-C old.sql > New.sql


Here, it is assumed that the original data is gb2312 encoded by default.
Step Three: Import
Modify Old.sql to add an SQL statement before the INSERT/UPDATE statement begins: "SET NAMES UTF8;", save.

Copy CodeThe code is as follows: Mysql-hlocalhost-uroot my_db < New.sql


Done!!

Attached 2: The MySQL client that supports viewing the UTF8 character set has
1.) Mysql-front, it is said that this project has been suspended by MySQL AB, I do not know why, if there are many cracked version can be downloaded (does not mean I recommend the use of cracked version:-P).
2.) Navicat, another very good MySQL client, the Chinese version has just come out, also invited me to try, overall is good, but also need to pay.
3.) PhpMyAdmin, open source PHP project, very good.
4.) Linux terminal tools (Linux terminal), the terminal's character set to UTF8, after connecting to MySQL, execute set NAMES UTF8; can also read and write UTF8 data.

Http://www.jb51.net/article/48775.htm

The difference between utf8_general_ci, Utf8_general_cs and Utf8_bin

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.