Mysql character set encoding settings and viewing commands

Source: Internet
Author: User

We can use show variables like character to view character encoding in mysql. modify the configuration method in my. ini. Let's take a look at it.

Recently, in the mysql database used by the project team, garbled data is inserted. We have summarized this issue. Let's start from the basic point, to the deep-seated causes and solutions of errors.

Basic concepts:
• Character refers to the smallest semantic symbol in human language. For example, 'A' and 'B;
• Given a series of characters, each character is assigned a value, which is used to represent the corresponding character. This value is the character Encoding (Encoding ). For example, if 'A' is given A value of 0 and 'B' is given A value of 1, 0 is the encoding of 'A;
• Given a series of characters and the corresponding encoding, a Set of all these characters and encoding pairs is a Character Set ). For example, if the given character list is {'A', 'B'}, {'A' => 0, 'B' => 1} is A character set;
• Collation refers to the comparison rules between characters in the same character set;
• Only after confirming the character order can an equivalent character set be defined and the relationship between characters in size be defined;
• Each Collation only corresponds to one character set, but one character set can correspond to multiple character sequences, one of which is the Default Collation );
• Names in the collation of MySQL follow naming conventions: names starting with character sets corresponding to the collation; names starting with _ ci (Case Insensitive) and _ cs (case sensitive) or end with _ bin (compare by encoding value. For example, in the collation "utf8_general_ci", "a" and "A" are equivalent;

MySQL character set support:
1. Character set ).
2. Collation ).

Run the show character set and show collation commands to view the character sets and sorting methods supported by Mysql.
-Mysql> show character set;
-Mysql> show collation;

Mysql supports character sets in four layers:
1. server );
2. database );
3. data table (table) (Field column );
4. connection );

MySQL character set variables:
-Character_set_server: Default internal operation Character Set
-Character_set_client: character set used by the client source data
-Character_set_connection: Connection layer Character Set
-Character_set_results: Query Result Character Set
-Character_set_database: Default Character Set of the currently selected Database
-Character_set_system: System metadata (field name, etc.) Character Set
-There are also variables starting with collation _, which are used to describe the collation.

Use introducer to specify the character set of the text string:
Format: [_ charset] 'string' [COLLATE collation]
For example:
-SELECT _ latin1 'string ';
-SELECT _ utf8 'Hello 'COLLATE utf8_general_ci;
Text strings modified by introducer are directly converted to internal character sets for processing without extra transcoding during the request process.

1. view the default Character Set(By default, the Mysql character set is latin1 (ISO_8859_1)
Generally, you can run the following two commands to view the character set and sorting method of the system:
 

The Code is as follows: Copy code

-Mysql> show variables like 'character % ';

+ -------------------------- + ----------------------------------- +
| Variable_name | Value |
+ -------------------------- + ----------------------------------- +
| Character_set_client | utf8 |
| Character_set_connection | utf8 |
| Character_set_database | utf8 |
| Character_set_filesystem | binary |
| Character_set_results | utf8 |
| Character_set_server | utf8 |
| Character_set_system | utf8 |
| Character_sets_dir | D: Program FilesMySQLMySQL Server 5.1 character charsets |
+ -------------------------- + ----------------------------------- +

-Mysql> show variables like 'collation _ % ';

+ ---------------------- + ----------------- +

| Variable_name | Value |

+ ---------------------- + ----------------- +

| Collation_connection | utf8_general_ci |

| Collation_database | utf8_general_ci |

| Collation_server | utf8_general_ci |

+ ---------------------- + ----------------- +


2. Modify the default Character Set
(1) The simplest modification method is to modify the character set key values (6 in total) in the mysql my. ini file, for example:
[Client]

The Code is as follows: Copy code

Default-character-set = utf8

Default-character-set = utf8
Character-set-server = utf8
Collation-server = utf8_general_ci
Init_connect = 'set collation_connection = utf8_general_ci'
Init_connect = 'set NAMES utf8'

After modification, restart the mysql service.

(2) another way to modify the character set is to use the mysql command:

The Code is as follows: Copy code
-Mysql> SET character_set_connection = utf8;
-Mysql> SET character_set_database = utf8;
-Mysql> SET character_set_server = utf8;
-Mysql> SET collation_server = utf8;
-Mysql> SET collation_database = utf8;
-Mysql> SET character_set_client = utf8;
-Mysql> SET character_set_results = utf8;
-Mysql> SET collation_connection = utf8;

Generally, even if the default Character Set of the table is set to utf8 and the query is sent through the UTF-8 encoding, you will find that the database is still garbled. The problem lies in the connection layer.

The solution is to execute the following statement before sending the query:

The Code is as follows: Copy code
Set names 'utf8 ';

It is equivalent to the following three commands:

The Code is as follows: Copy code
SET character_set_client = utf8;
SET character_set_results = utf8;
SET character_set_connection = utf8;


Methods for detecting Character Set Problems

• Show character set; # view CHARACTER sets supported by mysql
• Show collation; # view mysql character sequences (sorting method)
• Show variables like 'character % '; # Check the default character Set setting value of mysql.
• Show variables like 'collation % '; # view the default Character Sequence (sorting method) value of mysql.
• SQL functions HEX, LENGTH, and CHAR_LENGTH
• SQL functions CHARSET and COLLATION

Character Set conversion process in MySQL:


1. MySQL Server converts the request data from character_set_client to character_set_connection when receiving the request;
2. Before performing internal operations, convert the request data from character_set_connection to the internal operation character set. The method is as follows:
-SET the character set value for each data field;
-If the preceding value does not exist, use the default character set Value of the corresponding data table (MySQL extension, non-SQL standard );
-If the preceding value does not exist, use the default character set Value of the corresponding database;
-If the preceding value does not exist, use character_set_server to set the value.
3. Convert the operation result from the internal character set to character_set_results.

Garbled problem:
A our field does not have a character set, so we use the table dataset;
B. the character set is not specified in our table. By default, the character set of data inventory is used;
C. The character set is not specified during database creation. Therefore, use character_set_server to set the value;
D we didn't specifically modify the specified character set of character_set_server, so we use mysql Default;
E mysql Default character set is latin1, so we use the latin1 character set, and our character_set_connection character set is UTF-8, inserting Chinese garbled characters is hard to avoid.

MySQL Default character set:

MySQL can refine the character set designation to a database, a table, and a column. However, traditional programs do not use such complex configurations when creating databases and data tables. They use the default configuration. So where does the default configuration come from?
(1) When compiling MySQL, a default character set is specified, which is latin1;
(2) When installing MySQL, you can specify a default character set in the configuration file (my. ini). If this character set is not specified, this value inherits from the one specified during compilation;
(3) When mysqld is started, you can specify a default character set in the command line parameters. If not, this value inherits from the configuration in the configuration file, character_set_server is set to the default character set;
(4) When creating a new database, unless explicitly specified, the character set of this database is set to character_set_server by default;
(5) When a database is selected, character_set_database is set to the default Character Set of the database;
(6) When creating a table in the database, the default Character Set of the table is set to character_set_database, which is the default Character Set of the database;
(7) When setting a column in a table, unless explicitly specified, the default character set in this column is the default Character Set of the table;
Simply put, if you do not modify anything, all the columns of all tables in all databases will be stored in latin1. However, if you install MySQL, you will generally choose multi-language support, that is, the installer automatically sets default_character_set in the configuration file as a UTF-8, which ensures that all columns of all tables in all databases are stored in UTF-8 by default.

FAQs


1) The connection character set is not set before UTF-8 encoding is inserted into the data table whose default character set is utf8. During query, the connection character set is set to utf8.
-The default settings of the MySQL server are used during insertion. character_set_client, character_set_connection, and character_set_results are latin1;
-The data to be inserted will go through the character set conversion process of latin1 => latin1 => utf8. During this process, each inserted Chinese character will be changed from the original three bytes to 6 bytes for storage;
-The query result will go through the utf8 => utf8 Character Set conversion process, and the 6 bytes saved will not be returned, resulting in garbled characters.

2) before inserting utf8 encoded data into a data table with the default Character Set latin1, set the connection character set to utf8 (this is the error we encountered)
-During insertion, character_set_client, character_set_connection, and character_set_results are set to utf8;
-The inserted data is converted to the character set utf8 => utf8 => latin1. If the original data contains u0000 ~ Unicode characters outside the u00ff range will be converted to "?" because they cannot be expressed in the latin1 character set. (0 × 3F). The content cannot be restored regardless of the character set setting.

Suggestions for using MySQL character sets

• When creating databases/Tables and performing database operations, try to explicitly specify the character set used, instead of relying on the default settings of MySQL. Otherwise, MySQL upgrades may cause great problems;
• When latin1 is used for databases and character sets, garbled characters can be solved in most cases, but the disadvantage is that SQL operations cannot be performed in characters, generally, setting the database and connected character set to utf8 is a good choice;
• When using mysql CAPI (mysql provides APIs for C language operations), you can use mysql_options to SET the MYSQL_SET_CHARSET_NAME attribute to utf8 immediately after initializing the database handle. In this way, you do not need to explicitly use the set names statement to specify the connection character SET, when mysql_ping is used to reconnect to a persistent connection, the connection character set is reset to utf8;
• For mysql PHP APIs, page-level PHP programs generally run for a short period of time. After connecting to the database, you can explicitly use the set names statement to SET a character SET for connection. However, when using a persistent connection, ensure that the connection is smooth and you can use the set names statement to explicitly reset the connection character SET after the reconnection.

Other considerations

• Default_character_set in my. cnf only affects the connection character set when the mysql command is used to connect to the server, and does not affect applications that use the libmysqlclient library!
• SQL function operations on fields are generally performed using the internal operation character set, which is not affected by the character set setting of the connection.
• The bare strings in SQL statements will be affected by the connected character set or introducer settings. For comparison and other operations, they may produce completely different results. Be careful!

Summary


Based on the above analysis and suggestions, we should be clear about how to solve our problems. Yes, it is to specify the character set when creating the database. do not modify the default configuration to achieve the goal. Of course, you can also use the character set of the specified table, but it is easy to miss out, especially when many people participate in the design, it is more likely to leak out.

Therefore, what database version is used, whether it is 3.x, 4.0.x or 4.1.x, is not important to us. There are two important points:

1. Set the database encoding correctly. Character sets of versions earlier than MySQL4.0 are always the default ISO8859-1, and MySQL4.1 will let you choose when installing. If you are going to use UTF-8, you need to specify the UTF-8 when creating the database (you can also change it after the creation, 4.1 or later versions can also separately specify the character set of the table)
2. Set the database connection code correctly. After the database encoding is set, you should specify the connection encoding when connecting to the database. For example, when using jdbc connection, specify the connection as utf8.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.