Summary of knowledge about collation and character set in SQL Server and MySQL

Last Update:2018-08-17 Source: Internet

Author: User

Tags mysql version

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Text: A summary of the knowledge about collation and character set in SQL Server and MySQL

Character sets && sorting rules

A character set is a collection of character encodings for different languages, such as the UTF-8 character set, the GBK character set, the GB2312 character set, and so on, and different character sets encode characters using different rules
A collation is a specific character sort based on a particular character set, and the collation is a character set, which is a division of the character set in the sort dimension.
Collations are dependent on character sets, and a character set can have multiple collations, but a collation can be based on only one character set
such as the Chinese character set, also known as Chinese characters, you can follow the "phonetic sorting", "Sort by last name" and so on.
And for English, there is no "pinyin" and "Last name strokes", but can be divided into case-sensitive, case-insensitive, etc.
Other languages also have their own specific collation.

In SQL Server, a database of any one character set can store characters in any one language.
Not that the Latin (Latin) Character set data cannot be stored in Chinese, the Chinese (Chinese) Character set database cannot be stored in Mongolian (as long as the operating system itself supports it)
In SQL Server, either the database (or field) of a character set, which is actually a collation, can use nvarchar (or nchar), and nvarchar (or nchar) can store any non-Unicode character.
As for the collation, it is defined by the different collation rules that are supported by the different character sets.

Character sets and collations in SQL Server
A collation is simply a sort (comparison) rule that specifies the stored data, in other words, the collation already contains information about the character set.
Therefore, in SQL Server, you do not need to care about the character set, only the collation, in SQL Server can only specify the collation (not directly specify the character set).
For example, you can specify only collation, which is the character set

character set and collation in MySQL

It says that the collation is dependent on the character set, a character set can have multiple collations, but one collation can only be based on one character set.
The following is the correspondence between collations and character sets in MySQL.

MySQL's build library syntax is a drag, you can specify the character set and collation,
If the specified collation is below the character set, there is no problem, if the specified collation is not below the character set, an error will be given.

For example, the following sentence, collation Utf8_bin is a character set utf8 the following sort of collation, this statement execution is not a problem
Create DATABASE Test_database2 charset UTF8 collate utf8_bin;
Another example of the following sentence, collation Latin1_bin is not a character set UTF8 the following collation, this statement execution will be error
Create DATABASE Test_database2 charset UTF8 collate latin1_bin;

These are some of the basic applications of character sets and collations in SQL Server and MySQL, and the difference between commonly used collations

_genera_ci & ***_genera_cs & ***_bin Characteristics of common collation rules

These are the three kinds of collation rules commonly used in some character sets, and the following are examples of common UTF8.
Utf8_genera_ci is case-insensitive, CI is the abbreviation for case insensitive, that is, casing is not sensitive,
Utf8_general_cs is case-sensitive, CS is case-sensitive-sensitive, but currently the MySQL version does not support ***_genera_cs-like collation and uses Utf8_bin overrides directly.
Utf8_bin stores each character in a string with binary data, which is case-sensitive.
So, the same is case-sensitive, what is the difference between Utf8_general_cs and utf8_bin?
CS is the abbreviation for case sensitive, that is, casing-sensitive; bin means binary, which is the binary code comparison.
Utf8_general_cs collation, even if the case is distinguished, but some Western European characters and Latin characters are not differentiated, such as ä=a, but sometimes do not need to ä=a, so only Utf8_bin
Utf8_bin is characterized by the use of binary encoding of characters for operations, and any different binary encodings are different, so under the Utf8_bin collation: ä<>a

In the case of Utf8_genera_ci A=a,ä=a

Under Utf8_bin sorting rules, A<>a,ä<>a

So to be case-sensitive, there is no special need to use Utf8_bin directly (in fact ***_general_cs is not natively supported in MySQL, supported in SQL Server)
The features and usage of the above character sets behave similarly in SQL Server.

Above.

Summary of knowledge about collation and character set in SQL Server and MySQL

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More