Character set and proofing rules
A character set is a set of symbols and encodings. A proofing rule is a set of rules used to compare characters within a character set.
MySQL in collation to provide strong support, oracel in this respect did not find the appropriate information.
Different character sets have different proofing rules, naming conventions: Starting with their associated character set names, usually including a language name, and ending with _ci (case insensitive), _cs (case sensitive), or _bin (two yuan)
Proofing rules generally fall into two categories:
Binary collation, two-yuan method, which directly compares character encoding, can be considered case-sensitive because the encoding of ' a ' and ' a ' in the character set is obviously different.
Character Set _ language name, UTF8 default proofing rules are UTF8_GENERAL_CI
The MySQL character set and collation rules have 4 levels of default settings: Server-level, database-level, table-level, and connection-level.
Specifically, our system uses the UTF8 character set, which is case-insensitive when executing SQL queries using Utf8_bin proofing rules, using UTF8_GENERAL_CI case-insensitive. Do not use UTF8_UNICODE_CI.
such as CREATE DATABASE demo CHARACTER SET UTF8; The default collation rule is utf8_general_ci.
Unicode and UTF8
Unicode is just a set of symbols that specify the binary code of symbols, but do not specify how the binary should be stored.
The UTF8 character set is an optional way to store Unicode data. MySQL also supports another implementation ucs2.
Character Set (CharSet): is a set of symbols and encodings.
Proofing Rules (Collation): A set of rules used to compare characters within a character set, such as a rule that defines a relationship such as ' A ' < ' B '. Different collation can achieve different rules of comparison, such as ' a ' = ' a ' in some rules, and some not, and then, that is, some rules are case-sensitive, and some ignore.
Each character set has one or more proofing rules, and each collation rule can only belong to one character set.
Binary collation, two-yuan method, which directly compares character encoding, can be considered case-sensitive because the encoding of ' a ' and ' a ' in the character set is obviously different. In addition, there are more complex rules of comparison, which add some extra rules to the simple two-dollar rule, and are more complex to compare.
mysql5.1 the use of character sets and collation rules is much more advanced than most other database management systems and can be used and set at any level, and in order to effectively use these features, you need to know which character sets and collation rules are available and how to change the default values. And how they affect the behavior of character operators and string functions.
proofing rules generally have these characteristics:
Two different character sets cannot have the same proofing rules.
There is a default collation for each character set. For example, the UTF8 default proofing rule is utf8_general_ci.
There are proofing rule naming conventions: they start with their associated character set names, usually including a language name, and end with _CI (case insensitive), _cs (case sensitive), or _bin (two yuan)
Determining default character Sets and proofing
Character set and proofing rules have 4 levels of default settings: Server, database, table, and join.
Database character Set and proofing
Each database has a database character set and a database proofing rule that cannot be empty. The CREATE database and ALTER DATABASE statements have an optional clause that specifies the data set and collation rules:
CREATE DATABASE db_name DEFAULT CHARACTER SET latin1 COLLATE latin1_swedish_ci;
MySQL this way select the database character set and the database proofing rules:
· If character set X and collate y are specified, then character set X and collation rule y are used.
· If you specify character set x without specifying collate Y, the default collation rules for character set X and character set X are used.
· Otherwise, the server character set and the server proofing rules are used.
Using COLLATE in SQL statements
• Use the COLLATE clause to overwrite any default collation rules for a comparison. Collate can be used in a variety of SQL statements.
SELECT * from pro_product where product_code= ' ABCDEFG ' collate utf8_general_ci
Unicode and UTF8
Unicode is just a set of symbols that specify the binary code of symbols, but do not specify how the binary should be stored. Unicode codes can be stored directly in UCS-2 format. mysql supports the UCS2 character set.
UTF-8 is the most widely used method of Unicode implementation on the Internet. Other implementations include UTF-16 and UTF-32, but not on the Internet.
The UTF8 character set (convert Unicode representation) is an optional way to store Unicode data. It is implemented according to RFC 3629. The idea of the UTF8 character set is that different Unicode characters are encoded with variable-length byte sequences:
· The basic Latin alphabet, numerals, and punctuation are used in one byte.
· Most European and Middle Eastern handwritten letters are suitable for two byte sequences: extended Latin letters (including pronunciation symbols, macron, accents, bass and other notes), Cyrillic, Greek, Armenian, Hebrew, Arabic, Syriac, and other languages.
· Korean, Chinese and Japanese hieroglyphics use three byte sequences
Excerpted from: Measuring Life with dreams, measuring passion with running
MySQL5.5.8 has character set 39, proofreading set 195
Show Character Set
So a character set corresponds to multiple proofing sets, that is, the same character set has multiple collations
Like a UTF8 character set with a total of 22 collation
Utf8 The default collation set for the character set is Utf8_general_ci
Collation like ' utf8\_% ' through show
You can view
Utf8_general_ci are in normal alphabetical order and are case-insensitive (e.g. a B c D)
Utf8_bin in binary order (for example: A is in front of a, B D a C)