The following excerpt describes utf8_unicode_ci and utf8_general_ci In the MySQL 5.1 Chinese manual:
Currently, utf8_unicode_ci rules only support Unicode rules.Algorithm. Some characters are not supported. In addition, the combination of tags cannot be fully supported. This mainly affects some ethnic languages in Vietnam and Russia, such as Udmurt, Tatar, Bashkir, and Mari.
The main feature of utf8_unicode_ci is the support for expansion, that is, when a letter is considered equal to a combination of other letters. For example, in German and some other languages, 'taobao' is equal to 'ss '.
Utf8_general_ci is a legacy verification rule and does not support expansion. It can only compare characters one by one. This means that the utf8_general_ci verification rules are relatively fast, but they are more accurate than those using utf8_unicode_ci ).
For example, the comparison under the utf8_general_ci and utf8_unicode_ci verification rules is equal:
Ä=
Ö= o
U = u
The difference between the two verification rules is that the equation under utf8_general_ci is true:
Bytes = s
However, the equation for utf8_unicode_ci is true:
Token = SS
If utf8_unicode_ci sorting is poor for a language, the utf8 Character Set proofreading rules related to the specific language are executed. For example, utf8_unicode_ci works well in German and French, so you do not need to create special utf8 verification rules for these two languages.
Utf8_general_ci is also applicable to German and French, except for 'ss' rather than 'ss. If your application can accept this, you should use utf8_general_ci because it is fast. Otherwise, use utf8_unicode_ci because it is accurate.