Reproduced [Mysql]mysql Character Set dry

Last Update:2014-05-28 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Source Address: http://www.blogjava.net/zyskm/archive/2013/04/09/361888.html The concept of a character set everyone knows, proofing rules many people do not understand, the general database development is also not used in this concept, MySQL in this convenient seemingly very advanced, probably introduced.
Brief description
character set and proofing rules a character set is a set of symbols and encodings. Proofing rules are a set of rules that are used to compare characters within a character set. MySQL in collation to provide strong support, oracel in this respect did not find the corresponding information. Different character sets have different collation rules, naming conventions: Starting with their associated character set names, usually including a language name, and _ci (case insensitive), _cs (case sensitive), or _bin (two yuan) end proofing rules generally fall into two categories:binary collation, the two-dollar method, directly compares the encoding of characters, which can be considered case-sensitive, because the encoding of ' a ' and ' a ' in the character set is obviously different. Character Set _ language name, UTF8 default proofing rules are utf8_general_ciThe MySQL character set and proofing rules have 4 levels of default settings: Server-level, database-level, table-level, and connection-level.Specifically, our system uses the UTF8 character set, which is case-sensitive when executing SQL queries using Utf8_bin proofing rules, using UTF8_GENERAL_CI is case-insensitive. Do not use UTF8_UNICODE_CI. such as CREATE DATABASE demo CHARACTER SET UTF8; The default proofing rules are utf8_general_ci. Unicode and UTF8Unicode is just a set of symbols that specifies only the binary code of the symbol, but does not specify how the binary code should be stored. The UTF8 character set is an optional way to store Unicode data. MySQL also supports another implementation of UCS2.

Detailed description

Character Set (CharSet): is a set of symbols and encodings.
Proofing Rules (Collation): A set of rules that are used to compare characters within a character set, such as a rule that defines a relationship such as ' A ' < ' B '. Different collation can implement different rules of comparison, such as ' a ' = ' a ' is set up in some rules, and others are not, and then there are rules that are case-sensitive and some ignore.
Each character set has one or more proofing rules, and each proofing rule can belong to only one character set.

Binary collation, the two-dollar method, directly compares the encoding of characters, which can be considered case-sensitive, because the encoding of ' a ' and ' a ' in the character set is obviously different. In addition, there are more complex rules of comparison, which add some additional rules to a simple two-dollar law, and are more complex to compare.
mysql5.1 the use of character sets and proofing rules is much more advanced than most other database management systems, can be used and set at any level, and in order to effectively use these features, you need to know which character sets and proofing rules are available, how to change the default values, And how they affect the behavior of character operators and string functions.

Proofreading rules generally have these characteristics:

Two different character sets cannot have the same proofing rules.
Each character set has a default proofing rule. For example, the UTF8 default proofing rule is utf8_general_ci.
There are proofing rule naming conventions: they start with their associated character set names, usually include a language name, and end with _CI (case insensitive), _cs (case sensitive), or _bin (two yuan)

Determining the default character set and proofing
There are 4 levels of default settings for character set and proofing rules: Server-level, database-level, table-level, and connection-level.
Database character Set and proofing
Each database has a database character set and a database proofing rule, and it cannot be empty. The CREATE database and ALTER DATABASE statements have an optional clause to specify the DB character set and proofing rules:
For example:
CREATE DATABASE db_name DEFAULT CHARACTER SET latin1 COLLATE latin1_swedish_ci;
MySQL this selects the database character set and database proofing rules:
· If character set X and collate y are specified, then character set X and proofing rule y are used.
· If character set X is specified and no collate Y is specified, the default proofing rules for character set X and character set X are used.
· Otherwise, the server character set and the server proofing rules are used.
Using COLLATE in SQL statements
? Using the COLLATE clause, you can override any default proofing rules for a comparison. Collate can be used in a variety of SQL statements.
Use Where:
SELECT * from pro_product where product_code= ' ABCDEFG ' collate utf8_general_ci
Unicode and UTF8
Unicode is just a set of symbols that specifies only the binary code of the symbol, but does not specify how the binary code should be stored. Unicode codes can be stored directly in the UCS-2 format. mysql supports the UCS2 character set.
UTF-8 is the most widely used form of Unicode implementation on the Internet. Other implementations include UTF-16 and UTF-32, but they are largely unused on the Internet.
The UTF8 character set (converted Unicode representation) is an optional way to store Unicode data. It is executed according to RFC 3629. The idea of the UTF8 character set is that different Unicode characters are encoded with variable-length byte sequences:
· The basic Latin alphabet, numbers, and punctuation characters use one byte.
· Most European and Middle Eastern handwritten letters are suitable for two byte sequences: extended Latin alphabet (including pronunciation symbols, macron, accents, bass symbols, and other notes), Cyrillic, Greek, Armenian, Hebrew, Arabic, Syriac, and other languages.
· Korean, Chinese, and Japanese hieroglyphs use a three-byte sequence.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Reproduced [Mysql]mysql Character Set dry

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Reproduced [Mysql]mysql Character Set dry

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support