Mysql character set setting_mysql

Source: Internet
Author: User
Mysql character set setting has recently been working on a project related to mysql proxy, which involves parsing SQL statements. I didn't know much about mysql before. as the project progresses, I had to make up for it. Today, I suddenly saw mysql's charset settings, and I couldn't explain them clearly in many places. I had a thorough understanding of the working mechanism behind some related SQL statements, so I found some documents for further tutorial, take some notes by the way.

I. character set and verification rules

Character set is a set of symbols and numbers. A collation is a set of rules used to compare characters in a character set.

Character (character) is the smallest semantic symbol in human language. For example, the letters 'A' and 'B'. given A series of characters, each character is given A numerical value, which is used to represent the corresponding characters, this value is the character Encoding (Encoding ). For example, if we assign the value 0 to the character 'A' and 1 to the character 'B', then 0 is the encoding of the character 'A' and 1 is the encoding of the character 'B; after a series of characters are given and the corresponding encoding is granted, a Set of all these characters and encoding pairs is the Character Set ).

Collation rules refer to the comparison rules between characters in the same character set. after determining the rules, you can define what are equivalent characters in a character set, and the relationship between characters. each checking rule only corresponds to one character set, but one character set can correspond to multiple checking rules, one of which is the Default checking rule (Default Collation ).

Mysql can do these tasks:

  • Use multiple character sets to store strings
  • Compares strings using multiple checking rules
  • Use different character sets or proofreaders to mix character strings on the same server, database, or even table
  • Allows you to define character sets and verification rules at any level.

II. mysql character set and verification

The mysql server supports multiple character sets. you can use the show character set statement to list available character sets:

To list the checking rules, use the show collation statement:

Two different character sets cannot have the same checking rules. In addition, the names of the checking rules in mysql follow naming conventions: They start with the corresponding character set name; they start with _ ci (case insensitive), _ cs (case sensitive) or _ bin (compare by encoding value.

Mysql has four default settings for character sets and verification rules: Server-level, database-level, table-level, and connection-level.

1. server character set and verification

The mysql server has a server character set and a server verification rule. they cannot be set to null.

Mysql determines the server character set and server verification rules as follows:

  • Set according to valid options when the server is started
  • Based on the set value during runtime

When mysqld is started, the server character set and proofreading rules are determined based on the initial options used. You can use -- default-character-set to set the character set, and you can add -- default-collation to the proofreading rule after the character set. If no character set is specified, it is the same as -- default-character-set = latin1. If only one character set is specified (for example, latin1), but no verification rule is specified, it is the same as -- default-charset = latin1 -- default-collation = latin1_swedish_ci, because latin1_swedish_ci is the default proofreading rule of latin1. Therefore, the following three commands have the same effect:

shell> mysqldshell> mysqld --default-character-set=latin1shell> mysqld --default-character-set=latin1 --default-collation=latin1_swedish_ci

If you want to change the default server character set and verification rules when building from the source program, you can recompile and use: -- with-charset and -- with-collation as the parameters of configure. For example:

shell> ./configure --with-charset=latin1

Or:

shell> ./configure --with-charset=latin1 --with-collation=latin1_german1_ci

Both mysqld and configure verify that the character set/collation combination is valid. If it is invalid, each program will display an error message and terminate it.

The current server character set and verification rules can be used as the value of character_set_server and collation_server system variables. You can change the value of these variables at runtime to change the server character set and proofreading rules at runtime.

2. database character set and verification

Each database has a database character set and a database proofreading rule. it cannot be blank.

The create database and alter database statements have an optional clause to specify the database character set and collation:

create database db_name [[default] CHARACTER SET charset_name] [[default] COLLATE collation_name];alter database db_name [[default] CHARACTER SET charset_name] [[DEFAULT] COLLATE collation_name];
create database db_name default CHARACTER SET latin1 COLLATE latin1_swedish_ci;

Mysql selects the database character set and database verification rules as follows:

  • If character set x and collate y are specified, character set x and checking rule Y are used.
  • If character set x is specified but collate y is not specified, character set x is used as the default proofreading rule.
  • Otherwise, the server character set and server verification rules are used.

The create database... default character set... syntax of mysql is similar to the create table... character set... syntax of standard SQL. Therefore, you can create a database on the same mysql server that uses different character sets and verification rules.

If the table character set and collation are not specified in the create table statement, the database character set and collation are used as the default values.

The character set and collation rules of the default database can be used as character_set_database and collation_database system variables. The server sets the values of these two variables whenever the default database is changed. If no default database exists, these two variables have the same value as the corresponding server-level variables (character_set_server and collation_server.

3. Table character set and proofreading

Each table has a table character set and a proofreading rule. it cannot be blank.

To specify the table character set and collation, the create table and alter table statements have an optional clause:

create table tbl_name ( column_list ) [default CHARACTER SET charset_name [COLLATE collation_name ]];alter table tbl_name [default CHARACTER SET charset_name] [COLLATE collation_name ]
create table t1 ( ... ) default CHARACTER SET latin1 COLLATE latin1_danish_ci;

Mysql selects the table character set and proofreading rules as follows:

  • If character set x and collate y are specified, character set x and collate y are used.
  • If character set x is specified but collate y is not specified, character set x is used as the default proofreading rule.
  • Otherwise, the server character set and server verification rules are used.

If the column character set and collation are not specified in the column definition, the table character set and collation are used by default. Table character sets and proofreading rules are extensions of mysql; they are not in standard SQL.

4. column character set and verification

Each "character" column (that is, a CHAR, VARCHAR, or TEXT column) has a column character set and a column validation rule. it cannot be blank. The column definition syntax has an optional clause to specify the column character set and proofreading rules:

col_name {CHAR | VARCHAR | TEXT} ( col_length ) [CHARACTER SET charset_name [COLLATE collation_name ]]
create table table1(column1 VARCHAR(5) CHARACTER SET latin1 COLLATE latin1_german1_ci);

Mysql selects the column character set and proofreading rules as follows:

  • If character set x and collate y are specified, character set x and collate y are used.
  • If character set x is specified but collate y is not specified, character set x is used as the default proofreading rule.
  • Otherwise, use the table character set and proofreading rules.

The character set and COLLATE clauses are standard SQL statements.

5. connection character set and verification

Some character sets and checking rule system variables are related to the interaction between the client and the server. as mentioned above:

  • Server character sets and verification rules can be used as the value of character_set_server and collation_server variables.
  • L The character set and collation rules of the default database can be used as the values of character_set_database and collation_database variables.

Character set and checking rule variables are also involved in client-to-server connection. Each client has a connection-related character set and checking rule variable.

Consider what a "connection" means when connecting to the server. The client sends SQL statements, such as queries, to the server through connections. The server sends a response to the client through a connection, such as a result set. For client connections, this will cause some problems concerning the character set and verification rules of connections. these problems can be solved through system variables:

  • When the query leaves the client, which character set is used in the query?

The server uses the character_set_client variable as the character set used in the query sent by the client.

  • L What character set should the server convert to after receiving the query?

During conversion, the server uses character_set_connection and collation_connection system variables. It converts the query sent by the client from the character set indicated by character_set_client to the character set indicated by character_set_connection (unless the string text has a prefix like _ latin1 or _ utf8 ). Collation_connection is important for comparing text strings. It is not important to compare the column value strings because the column has a higher priority for checking rules.

Before performing an internal operation, convert the request data from character_set_connection to the internal operation character set. the method is as follows:

A. SET the character set value for each data field;

B. If the preceding value does not exist, use the default character set value of the corresponding data table (mysql extension, non-SQL standard );

C. If the preceding value does not exist, use the default character set value of the corresponding database;

D. If the preceding value does not exist, use character_set_server to set the value.

  • Which character set should the server convert to before sending a result set or returning an error message to the client?

The character_set_results variable indicates that the server returns the query result to the character set used by the client. Includes result data, such as column values and result metadata (such as column names ).

Can adjust the settings of these variables, or can depend on the default value.

Two statements affect the character set:

set names ' charset_name ' SET CHARACTER SET charset_name;

Set names displays the character sets used in the SQL statements sent by the client. Therefore, the set names 'cp1251 'statement tells the server that "the information sent from this client will use the character set cp1251 ". It also specifies the character set for the results sent from the server back to the client. (For example, if you use a SELECT statement, it indicates the character set used by the column value)

The set names 'X' statement is equivalent to the three statements:

set character_set_client = x ; set character_set_results = x ; set character_set_connection = x ;

Setting x to character_set_connection sets the default checking rule that collation_connection is x.

The set character set statement is similar, but sets the connection CHARACTER set and proofreading rules for the default database. The set character set x statement is equivalent to the three statements:

SET character_set_client = x ; SET character_set_results = x ; SET collation_connection = @@collation_database;

When a client connects, it sends the desired character set name to the server. The server sets character_set_client, character_set_results, and character_set_connection variables for that character set. (In fact, the server performs a set names operation for this character set .)

For mysql clients, if you want to use a character set different from the default character set, you do not need to execute the set names statement at each startup. You can add a -- default-character-set option in the mysql statement line or the option file. For example, each time you run mysql, the following option file sets three character set variables to utf8:

[mysql]default-character-set=utf8
For example, suppose column1 is defined as CHAR (5) character set latin2. If set names or set character set is not set, for select column1 from t, after the connection, the server returns all values of column column1 using the CHARACTER set specified by the client. On the other hand, if you set names 'latin1' or set character set latin1, the server converts latin2 to latin1 before sending the result. Conversion may lose characters that do not belong to two character sets.

If you do not want the server to perform any conversions, set character_set_results to NULL:

mysql> SET character_set_results = NULL;

6. character set and proofreading of strings

Each character string has a character set and a verification rule. it cannot be blank.

A string may have an optional character set prefix and COLLATE clause:

[_ charset_name ] ' string ' [COLLATE collation_name ]
select ' string ';select _latin1 ' string ';select _latin1' string ' COLLATE latin1_danish_ci;

For a simple select 'string' statement, the string uses the character set and proofreading rules defined by the character_set_connection and collation_connection system variables.

The _ charset_name expression is formally called as a quote. It tells the parser that "the character set X will be used for the subsequent strings ." The quote is valid in standard hexadecimal letters and numbers hexadecimal symbols (x 'literal' and 0x nnnn.

select _latin1 x'AABBCC'; select _latin1 0xAABBCC;

MySQL determines a character set and verification rules as follows:

  • If character set x and collate y are specified, character set x and collate y are used.
  • If character set x is specified but collate y is not specified, the character set x is used as the default proofreading rule.
  • Otherwise, use the character set and verification rules given through character_set_connection and collation_connection system variables.
  • Use the latin1 character set and latin1_german1_ci to proofread the rule strings:

Select_latin1 'müler' COLLATE latin1_german1_ci;

  • Use the latin1 character set and the default collation string (that is, latin1_swedish_ci ):

Select_latin1 'müler ';

  • Use a string that connects to the default character set and collation:

Select 'müler ';

Character set prefixes and COLLATE clauses are implemented according to standard SQL specifications.

COLLATE can be used in multiple SQL statements: order by, as, group by, aggregate function, distinct, where, having, etc. The COLLATE clause has a higher priority (higher than | ).

The BINARY operator is a stenographer of the COLLATE clause. BINARY 'X' is equivalent to 'X' COLLATE y. here, y is the name of the character set 'X' BINARY verification rule. Each character set has a binary verification rule. For example, the dual check rule for latin1 character set is latin1_bin. Therefore, if Column a is character set latin1, the following two statements have the same effect:

select * from t1 order by BINARY a;select * from t1 order by a COLLATE latin1_bin;

7. check whether the rules can be compressed"

Among the vast majority of queries, it is obvious which proofreading rules mysql uses for comparison. For example, in the following cases, the checking rules are clearly "Column x checking rules ":

select x from T order by x;select x from T where x = x;select distinct x from T;

However, it may not be clear when multiple operands are involved. For example:

select x from T where x = 'Y';

Should this query use the checking rules of Column x or the checking rules of string text 'Y?

Standardized SQL solves this problem by using "compression" rules. Basically, this means that since both x and 'Y' have school rules, which school rules take precedence? This may be difficult to solve, but the following rules are applicable in most cases:

  • An external COLLATE clause is 0 (compression is not allowed at all .)
  • The two strings connected using different proofread rules are 1.
  • The compression of column checking rules is 2.
  • The "system Constant" (for example, the string returned by the USER () or VERSION () function) is 3.
  • The compression of the text rule is 4.
  • If the expression is NULL or the expression derived from NULL is 5.

The above compression value is currently used by mysql.

The above rules can be fuzzy:

  • Proofread rules that use the lowest possible compression value.
  • If the two sides have the same condensibility, an error occurs if the proofreading rules are different.

Use the COERCIBILITY () function to determine the compression of a string expression:

mysql> SELECT COERCIBILITY('A' COLLATE latin1_swedish_ci);-> 0mysql> SELECT COERCIBILITY(VERSION());-> 3mysql> SELECT COERCIBILITY('A');-> 4

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.