Recently, the project team used in the MySQL database, the insertion of data appears garbled, on this issue to do a summary, we start from the most basic place, to the root causes of errors and solutions.
Basic concepts:
• character (Character) refers to the smallest form of semantic notation in human language. such as ' A ', ' B ', etc.;
• Given a series of characters, assign a numeric value to each character, and use a numeric value to represent the corresponding character, which is the encoding of the character (Encoding). For example, we give the character ' a ' a value of 0, giving the character ' B ' a value of 1, then 0 is the character ' a ' encoding;
• Given a series of characters and a corresponding encoding, the set of all these characters and pairs of encodings is the character set (Character set). For example, when the given character characters is {' A ', ' B '}, {' A ' =>0, ' B ' =>1} is a character set;
• Character Fu She (collation) refers to the rules of comparison between characters in the same character set;
• Determine the character Fu She to define what is equivalent on a character set and the size relationship between characters;
• Each character Fu She uniquely corresponds to a character set, but a character set can correspond to a variety of character Fu She, one of which is the default character Fu She (defaulted collation);
The word Fu She name in MySQL follows the naming convention: starts with a character set name corresponding to the Fu She, ends with _ci (which indicates case insensitive), _cs (for case sensitive), or _bin (indicates a comparison by encoded value). For example: Under the word Fu She "Utf8_general_ci", the character "a" and "a" are equivalent;
MySQL's character set support:
1, Character set (Character set).
2, the Sort comparison method (collation).
By command: Show character set collation can view MySQL-supported character sets and sorting methods separately.
–mysql> show Character set;
–mysql> Show collation;
MySQL's support for character sets is refined to four levels:
1, servers (server);
2, databases (database);
3, Data table (table) (field column);
4, connection (connection);
MySQL Character Set variable:
–character_set_server: Default internal action Character set
–character_set_client: Character set used by client source data
–character_set_connection: Connection Layer Character Set
–character_set_results: Query result Character Set
–character_set_database: The default character set for the currently selected database
–character_set_system: System metadata (field name, etc.) character set
-There are also the corresponding variables at the beginning of the collation_ to describe the word Fu She.
Use Introducer to specify the character set of a text string:
format is: [_charset] ' string ' [COLLATE collation]
For example:
–select _latin1 ' string ';
–select _utf8 ' Hello ' COLLATE utf8_general_ci;
Text strings decorated by Introducer are converted directly to internal character set processing without the extra transcoding during the request.
1. View the default character set (by default, MySQL's character set is Latin1 (iso_8859_1)
In general, the setting of the system's character set and sorting can be viewed through the following two commands:
The code is as follows |
Copy Code |
–mysql> show VARIABLES like ' character% '; +--------------------------+---------------------------------+ | variable_name | Value | +--------------------------+---------------------------------+ | character_set_client | UTF8 | | character_set_connection | UTF8 | | Character_set_database | UTF8 | | Character_set_filesystem | binary | | Character_set_results | UTF8 | | Character_set_server | UTF8 | | Character_set_system | UTF8 | | Character_sets_dir | D:program Filesmysqlmysql Server 5.1sharecharsets | +--------------------------+---------------------------------+
–mysql> show VARIABLES like ' collation_% '; +----------------------+-----------------+ | variable_name | Value | +----------------------+-----------------+ | collation_connection | Utf8_general_ci | | Collation_database | Utf8_general_ci | | Collation_server | Utf8_general_ci | +----------------------+-----------------+ |
2, modify the default character set
(1) The simplest modification method is to modify the MySQL My.ini file character Set key values (a total of 6), such as:
[Client]
The code is as follows |
Copy Code |
Default-character-set = UTF8 Default-character-set = UTF8 Character-set-server = UTF8 Collation-server = Utf8_general_ci Init_connect = ' SET collation_connection = Utf8_general_ci ' Init_connect = ' SET NAMES UTF8 ' |
After the modification, restart the MySQL service.
(2) There is also a way to modify the character set, which is to use the MySQL command:
The code is as follows |
Copy Code |
–mysql> SET character_set_connection = UTF8; –mysql> SET character_set_database = UTF8; –mysql> SET character_set_server = UTF8; –mysql> SET collation_server = UTF8; –mysql> SET collation_database = UTF8; –mysql> SET character_set_client = UTF8; –mysql> SET character_set_results = UTF8; –mysql> SET collation_connection = UTF8; |
In general, even if the default set of the table is UTF8 and send a query via UTF-8 encoding, you will find that the database is still garbled. The problem is on the connection connection layer.
The workaround is to execute the following sentence before sending the query:
The code is as follows |
Copy Code |
SET NAMES ' UTF8 ';
|
It corresponds to the following three-sentence instruction:
The code is as follows |
Copy Code |
SET character_set_client = UTF8; SET character_set_results = UTF8; SET character_set_connection = UTF8;
|
Some means of detecting character set problems
show CHARACTER SET; #查看mysql支持的字符集
Show collation; #查看mysql的字符序列 (sort)
show VARIABLES like ' character% '; #查看mysql系统默认字符集设置值
show VARIABLES like ' collation% '; #查看mysql系统默认的字符序列 (sort) Set value
SQL function Hex, LENGTH, Char_length
SQL function CharSet, collation
the character set conversion process in MySQL:
1. When the MySQL server receives the request, it converts the request data from Character_set_client to Character_set_connection
2. Request data from Character_ before internal operation Set_connection is converted to the internal operation character set, which is determined as follows:
-Use the character set value of each data field;
-If the above values do not exist, use the default of the corresponding datasheet CHARACTER Set Set value (MySQL extension, non-SQL standard);
-If the above values do not exist, use the default CHARACTER set value of the corresponding database;
-If the above values do not exist, The value is set using Character_set_server.
3. Converts the result of the operation from the internal operation character set to Character_set_results.
We generated a garbled problem:
A Our fields do not set the character set, so we use the table's dataset;
B Our table does not specify a character set, default uses the database-stored character set;
C Our database does not specify a character set when it is created, so use the Character_set_server set value;
D We did not deliberately modify the specified character set for character_set_server, so use MySQL default;
E MySQL default character set is Latin1, therefore, we use the latin1 character set, and our character_set_connection character set is UTF-8, insert Chinese garbled also inevitable.
mysql default character set:
MySQL's designation for character sets can be refined to a database, a table, a column, and what character set should be used. But traditional programs do not use a complex configuration when creating databases and datasheets, and they use the default configuration, so where does the default configuration come from?
(1) When compiling MySQL, a default character set is specified, which is latin1;
(2) When installing MySQL, you can specify a default character set in the configuration file (My.ini), and if not specified, the value inherits from the compile-time specified;
(3 When starting mysqld, you can specify a default character set in the command-line arguments, and if not specified, this value inherits from the configuration in the configuration file, at which point the character_set_server is set to this default character set;
(4) When a new database is created, Unless explicitly specified, the character set of the database is set by default to Character_set_server
(5) When a database is selected, Character_set_database is set to the default character set for this database;
(6) When a table is created in this database, the default character set of the table is set to Character_set_database, which is the default character set for the database;
(7) When a column is set in a table, the default character set for this column is the default character set of the table unless explicitly specified;
A simple summary, if there is no change, then all the tables of all the database all the fields are stored in latin1, but if we install MySQL, will generally choose multi-language support, that is, the installer will automatically in the configuration file in the Default_ Character_set is set to UTF-8, which guarantees that all fields of all tables in all databases are stored UTF-8 by default.
Common problem resolution
1) does not set the connection character set before inserting UTF8 encoded data into the UTF8 datasheet for the default character set, and setting the connection character set to UTF8
– Insert is based on the default settings of the MySQL server, character_set_ The client, character_set_connection, and Character_set_results are latin1; the data for the
– insert operation passes through latin1=>latin1=> UTF8 character set conversion process, in which each inserted Chinese character will be saved from the original 3 bytes into 6 bytes; The result of the
– query will be the Utf8=>utf8 character set conversion process, the saved 6 bytes returned intact, resulting in garbled characters.
2) Sets the connection character set to the UTF8 encoded data before inserting the Latin1 data table into the default character set to UTF8 (the error that we encounter is the same)
– inserted based on the connection character set, Character_set_client, character_set_connection and Character_set_results are UTF8;
-inserts data through utf8=>utf8=> Latin1 Character set conversion, if the original data contains Unicode characters outside the U0000~U00FF range, is converted to "?" because it cannot be represented in the latin1 character set. (0x3f) symbol, which cannot be recovered at a later query regardless of the connection character set setting.
recommendations when using the MySQL character set
• Create database/table and database operations as much as possible to explicitly indicate the use of the character set, rather than relying on MySQL default settings, or MySQL upgrade may cause great trouble;
• Database and connection character sets are used when latin1, although most cases can solve the garbled problem, but the disadvantage is that the character is not the unit for SQL operations, in general, the database and the connection character set for the UTF8 is a better choice;
• Using the MySQL CAPI (MySQL provides the API for the C language operation), initialize the database handle with Mysql_options to set the Mysql_set_charset_name property to UTF8 immediately, so that you do not have to explicitly use the set The names statement specifies the connection character set, and the connection character set is reset to UTF8 when a long connection with mysql_ping is disconnected;
• For the MySQL php API, General page-level PHP program total running time is short, after connecting to the database explicitly with the SET NAMES statement set a connection character set; But when using long connections, be careful to keep the connections open and use set after disconnecting the reconnection The names statement explicitly resets the connection character set.
Other considerations
The Default_character_set setting in my.cnf only affects the connection character set when the MySQL command connects to the server, and does not have any effect on applications that use the Libmysqlclient library!
• SQL function operations on fields are usually performed in the internal operation character set, and are not affected by the connection character set setting.
The bare strings in the SQL statement are affected by the connection character set or the Introducer setting, which can produce completely different results for comparisons, and you need to be careful!
Summary
According to the above analysis and suggestion, we should solve the problem we should use what method everyone should be more clear in mind. Yes, when you create the database, specify the character set, do not modify the default configuration to achieve the goal, of course, you can also use the format of the specified table, but it is easy to omit, especially when many people are involved in the design, more prone to error.
Therefore, the use of what database version, whether it is 3.x, 4.0.x or 4.1.x, in fact, not important to us, the important point is two
1. Correctly set the database code. MySQL4.0 the following version of the character set is always the default iso8859-1,mysql4.1 will allow you to choose when installing. If you are going to use UTF-8, then you need to specify a good UTF-8 when you create the database (you can also change it after you create it, and you can also specify the character set for the table in the 4.1 version)
2. Correctly set the database connection code. After you have set up your database encoding, you should specify connection encoding when connecting to the database, such as specifying the connection as UTF8 when using JDBC connection.