Mysql Character Set and collation (Mysql collation)

Source: Internet
Author: User
The concept of character set is clear to everyone. Many people do not understand the proofreading rules. This concept is not used in general database development. mysql seems very advanced in convenience.

The concept of character set is clear to everyone. Many people do not understand the proofreading rules. This concept is not used in general database development. mysql seems very advanced in convenience.

Brief Description

Character Set and verification rules
Character Set is a set of symbols and encoding. A collation is a set of rules used to compare characters in a character set.
MySql provides strong support in collation, and mongoel does not find relevant information in this regard.
Different character sets have different proofreading rules. Naming Conventions: start with their character set names, usually include a language name, and use _ ci (Case Insensitive), _ cs (case sensitive) or _ bin (Binary) End
There are two types of proofreading rules:
Binary collation, binary method, directly compares character encoding, which can be considered case-sensitive, because the encoding of 'A' and 'A' in the character set is obviously different.
Character Set _ language name. The default verification rule for utf8 is utf8_general_ci.
The mysql Character Set and collation have four default settings: Server, database, table, and connection.
Specifically, our system uses the utf8 character set. If you use utf8_bin checking rules to execute SQL query, the time zone is case-sensitive, and utf8_general_ci is case-insensitive. Do not use utf8_unicode_ci.
For example, create database demo character set utf8; the default proofreading rule is utf8_general_ci.

Unicode and UTF8
Unicode is just a collection of symbols. It only specifies the binary code of a symbol, but does not specify how the binary code should be stored.
The UTF8 character set is an optional method for storing Unicode data. Mysql also supports ucs2.

Detailed description

Character Set: a set of symbols and encodings.
Collation: A set of rules used to compare characters in A character set, such as rules defining the relationship like 'A' <'B. Different collation rules can implement different comparison rules. For example, 'A' = 'A' is valid in some rules, but some are not valid. In other words, some rules are case sensitive, some do not.
Each character set has one or more collation rules, and each collation can only belong to one character set.

Binary collation, binary method, directly compares character encoding, which can be considered case-sensitive, because the encoding of 'A' and 'A' in the character set is obviously different. In addition, there are more complex comparison rules. These rules add some additional provisions on the simple binary method, which is more complicated.
Mysql5.1 is much ahead of Character Set and verification rules than most other database management systems. It can be used and set at any level. In order to effectively use these functions, you need to know which character sets and verification rules are available, how to change default values, and how they affect the behavior of character operators and string functions.

Proofreading rules generally have these features:

Two different character sets cannot have the same verification rules.
Each character set has a default verification rule. For example, the default verification rule for utf8 is utf8_general_ci.
There is a naming convention for proofreading rules: they start with their relevant Character Set names, usually include a language name, and are case-insensitive (_ ci), _ cs (case-sensitive) or _ bin (Binary) ends


Determine the default Character Set and Verification
Character Set and verification rules have four default settings: Server-level, database-level, table-level, and connection-level.
Database Character Set and Verification
Each database has a database character set and a database proofreading rule. It cannot be blank. The create database and alter database statements have an optional clause to specify the DATABASE character set and collation:
For example:
Create database db_name default character set latin1 COLLATE latin1_swedish_ci;
MySQL selects the database character set and database verification rules as follows:
· If character set x and collate y are specified, character set x and checking rule Y are used.
· If character set x is specified but collate y is not specified, character set x and character set x are used as the default proofreading rules.
· Otherwise, server character sets and server verification rules are used.
Use COLLATE in SQL statements
• The COLLATE clause can overwrite any default checking rules for a comparison. COLLATE can be used in multiple SQL statements.
Use WHERE:
Select * from pro_product where product_code = 'abcdefg' collate utf8_general_ci
Unicode and UTF8
Unicode is just a collection of symbols, it only specifies the binary code of the symbols, but does not specify how the binary code should be stored. Unicode code can be stored directly in UCS-2 format. mysql supports the ucs2 character set.
UTF-8 is the most widely used unicode implementation method on the Internet. Other implementations also include UTF-16 and UTF-32, but are basically not needed on the Internet.
UTF8 character set (converted to Unicode) is an optional method for storing Unicode data. It is executed according to RFC 3629. The UTF8 Character Set uses variable-length bytes sequence encoding for different Unicode characters:
· The basic Latin letters, numbers, and punctuation marks use one byte.
· Most European and Middle East handwritten letters are suitable for two byte sequences: Extended Latin letters (including pronunciation symbols, Longines, accents, Bass symbols, and other notes) spanish letters, Greek letters, Armenia, Hebrew, Arabic, Syrian and other languages.
· Korean, Chinese, and Japanese hieroglyphics use three byte Sequences
From: Measuring life with dreams and passion with running

Verification set

MySQL 195 contains 39 character sets and collation sets.

# Show All verification Sets

Show collation

# Display all character sets

Show character set

Therefore, a character set corresponds to multiple verification sets, that is, the same character set has multiple sorting rules.

For example, a utf8 character set contains 22 sorting rules.

Utf8The default collation of character sets is utf8_general_ci.

Show collation like 'utf8 \ _ %'

You can view

Note:

Utf8_general_ci follows the normal alphabetic order and is case insensitive (for example, a B c D)

Utf8_bin is sorted in binary order (for example, A is placed before a and B D a c)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.