Text: A summary of the knowledge about collation and character set in SQL Server and MySQL
Character sets && sorting rules
A character set is a collection of character encodings for different languages, such as the UTF-8 character set, the GBK character set, the GB2312 character set, and so on, and different character sets encode characters using different rules
A collation is a specific character sort based on a particular character set, and the collation is a character set, which is a division of the character set in the sort dimension.
Collations are dependent on character sets, and a character set can have multiple collations, but a collation can be based on only one character set
such as the Chinese character set, also known as Chinese characters, you can follow the "phonetic sorting", "Sort by last name" and so on.
And for English, there is no "pinyin" and "Last name strokes", but can be divided into case-sensitive, case-insensitive, etc.
Other languages also have their own specific collation.
In SQL Server, a database of any one character set can store characters in any one language.
Not that the Latin (Latin) Character set data cannot be stored in Chinese, the Chinese (Chinese) Character set database cannot be stored in Mongolian (as long as the operating system itself supports it)
In SQL Server, either the database (or field) of a character set, which is actually a collation, can use nvarchar (or nchar), and nvarchar (or nchar) can store any non-Unicode character.
As for the collation, it is defined by the different collation rules that are supported by the different character sets.
Character sets and collations in SQL Server
A collation is simply a sort (comparison) rule that specifies the stored data, in other words, the collation already contains information about the character set.
Therefore, in SQL Server, you do not need to care about the character set, only the collation, in SQL Server can only specify the collation (not directly specify the character set).
For example, you can specify only collation, which is the character set
character set and collation in MySQL
It says that the collation is dependent on the character set, a character set can have multiple collations, but one collation can only be based on one character set.
The following is the correspondence between collations and character sets in MySQL.
MySQL's build library syntax is a drag, you can specify the character set and collation,
If the specified collation is below the character set, there is no problem, if the specified collation is not below the character set, an error will be given.
For example, the following sentence, collation Utf8_bin is a character set utf8 the following sort of collation, this statement execution is not a problem
Create DATABASE Test_database2 charset UTF8 collate utf8_bin;
Another example of the following sentence, collation Latin1_bin is not a character set UTF8 the following collation, this statement execution will be error
Create DATABASE Test_database2 charset UTF8 collate latin1_bin;
These are some of the basic applications of character sets and collations in SQL Server and MySQL, and the difference between commonly used collations
_genera_ci & ***_genera_cs & ***_bin Characteristics of common collation rules
These are the three kinds of collation rules commonly used in some character sets, and the following are examples of common UTF8.
Utf8_genera_ci is case-insensitive, CI is the abbreviation for case insensitive, that is, casing is not sensitive,
Utf8_general_cs is case-sensitive, CS is case-sensitive-sensitive, but currently the MySQL version does not support ***_genera_cs-like collation and uses Utf8_bin overrides directly.
Utf8_bin stores each character in a string with binary data, which is case-sensitive.
So, the same is case-sensitive, what is the difference between Utf8_general_cs and utf8_bin?
CS is the abbreviation for case sensitive, that is, casing-sensitive; bin means binary, which is the binary code comparison.
Utf8_general_cs collation, even if the case is distinguished, but some Western European characters and Latin characters are not differentiated, such as ä=a, but sometimes do not need to ä=a, so only Utf8_bin
Utf8_bin is characterized by the use of binary encoding of characters for operations, and any different binary encodings are different, so under the Utf8_bin collation: ä<>a
In the case of Utf8_genera_ci A=a,ä=a
Under Utf8_bin sorting rules, A<>a,ä<>a
So to be case-sensitive, there is no special need to use Utf8_bin directly (in fact ***_general_cs is not natively supported in MySQL, supported in SQL Server)
The features and usage of the above character sets behave similarly in SQL Server.
Above.
Summary of knowledge about collation and character set in SQL Server and MySQL