The concept of a character set everyone knows, proofing rules many people do not understand, the general database development is also not used in this concept, MySQL in this convenient seemingly very advanced, about about
Brief description
Character set and proofing rules
A character set is a set of symbols and encodings. Proofing rules are a set of rules that are used to compare characters within a character set.
MySQL in collation to provide strong support, oracel in this respect did not find the corresponding information.
Different character sets have different collation rules, naming conventions: Starting with their associated character set names, usually including a language name, and ending with _ci (case insensitive), _cs (case sensitive), or _bin (two)
Proofreading rules generally fall into two categories:
Binary collation, the two-dollar method, directly compares the encoding of characters, which can be considered case-sensitive, because the encoding of ' a ' and ' a ' in the character set is obviously different.
Character Set _ language name, UTF8 default proofing rules are UTF8_GENERAL_CI
The MySQL character set and proofing rules have 4 levels of default settings: Server-level, database-level, table-level, and connection-level.
Specifically, our system uses the UTF8 character set, which is case-sensitive when executing SQL queries using Utf8_bin proofing rules, using UTF8_GENERAL_CI is case-insensitive. Do not use UTF8_UNICODE_CI.
such as CREATE DATABASE demo CHARACTER SET UTF8; The default proofing rules are utf8_general_ci.
Unicode and UTF8
Unicode is just a set of symbols that specifies only the binary code of the symbol, but does not specify how the binary code should be stored.
The UTF8 character set is an optional way to store Unicode data. MySQL also supports another implementation of UCS2.
Detailed description
Character Set (CharSet): is a set of symbols and encodings.
Proofing Rules (Collation): A set of rules that are used to compare characters within a character set, such as a rule that defines a relationship such as ' A ' < ' B '. Different collation can implement different rules of comparison, such as ' a ' = ' a ' is set up in some rules, and others are not, and then there are rules that are case-sensitive and some ignore.
Each character set has one or more proofing rules, and each proofing rule can belong to only one character set.
Binary collation, the two-dollar method, directly compares the encoding of characters, which can be considered case-sensitive, because the encoding of ' a ' and ' a ' in the character set is obviously different. In addition, there are more complex rules of comparison, which add some additional rules to a simple two-dollar law, and are more complex to compare.
mysql5.1 the use of character sets and proofing rules is much more advanced than most other database management systems, can be used and set at any level, and in order to effectively use these features, you need to know which character sets and proofing rules are available, how to change the default values, And how they affect the behavior of character operators and string functions.
Proofreading rules generally have these characteristics:
Two different character sets cannot have the same proofing rules.
Each character set has a default proofing rule. For example, the UTF8 default proofing rule is utf8_general_ci.
There are proofing rule naming conventions: they start with their associated character set names, usually include a language name, and end with _CI (case insensitive), _cs (case sensitive), or _bin (two yuan)
Determining the default character set and proofing
There are 4 levels of default settings for character set and proofing rules: Server-level, database-level, table-level, and connection-level.
Database character Set and proofing
Each database has a database character set and a database proofing rule, and it cannot be empty. The CREATE database and ALTER DATABASE statements have an optional clause to specify the DB character set and proofing rules:
For example:
CREATE DATABASE db_name DEFAULT CHARACTER SET latin1 COLLATE latin1_swedish_ci;
MySQL this selects the database character set and database proofing rules:
· If character set X and collate y are specified, then character set X and proofing rule y are used.
· If character set X is specified and no collate Y is specified, the default proofing rules for character set X and character set X are used.
· Otherwise, the server character set and the server proofing rules are used.
Using COLLATE in SQL statements
• Use the COLLATE clause to override any default proofing rules for a comparison. Collate can be used in a variety of SQL statements.
Use Where:
SELECT * from pro_product where product_code= ' ABCDEFG ' collate utf8_general_ci
Unicode and UTF8
Unicode is just a set of symbols that specifies only the binary code of the symbol, but does not specify how the binary code should be stored. Unicode codes can be stored directly in the UCS-2 format. mysql supports the UCS2 character set.
UTF-8 is the most widely used form of Unicode implementation on the Internet. Other implementations include UTF-16 and UTF-32, but they are largely unused on the Internet.
The UTF8 character set (converted Unicode representation) is an optional way to store Unicode data. It is executed according to RFC 3629. The idea of the UTF8 character set is that different Unicode characters are encoded with variable-length byte sequences:
· The basic Latin alphabet, numbers, and punctuation characters use one byte.
· Most European and Middle Eastern handwritten letters are suitable for two byte sequences: extended Latin alphabet (including pronunciation symbols, macron, accents, bass symbols, and other notes), Cyrillic, Greek, Armenian, Hebrew, Arabic, Syriac, and other languages.
· Korean, Chinese, and Japanese hieroglyphs use three byte sequences
Excerpt: Measuring Life with a dream, measuring passion with a run
Proofing Sets
MySQL5.5.8 has character set 39, proofreading set of 195
#显示所有的校对集
Show collation
#显示所有的字符集
Show Character Set
So a character set corresponds to multiple proofing sets, that is, the same character set has multiple collations
For example, a UTF8 character set has 22 collation
The default collation set for the Utf8 character set is Utf8_general_ci
By show collation like ' utf8\_% '
You can view
Attention:
Utf8_general_ci are in normal alphabetical order and are not case sensitive (e.g. a B c D)
Utf8_bin in binary order (for example: A in front of a, B D a C)
MySQL character set and proofing rules (MySQL proofing set)