Recently, in the project group used by the MySQL database, the insertion of data garbled, on this issue to make a summary, we start from the most basic point, to the root causes of errors and solutions.
Basic concepts
• The character (Character) refers to the smallest of the ideographic symbols in the human language. such as ' A ', ' B ' and so on;
• Given a series of characters, each character is given a numeric value that represents the corresponding character, which is the encoding of the character (Encoding). For example, we give the character ' a ' a value of 0, give the character ' B ' a value of 1, then 0 is the encoding of the character ' a ';
• Given a series of characters and given the corresponding encoding, all of these characters and the set of encoding pairs is the character set (Character set). For example, given a word list characters {' A ', ' B '}, {' A ' =>0, ' B ' =>1} is a character set;
• Character Fu She (Collation) refers to the comparison rules between characters within the same character set;
• After you have determined the character Fu She, you can define what is equivalent and the size relationship between characters on a character set.
• Each word Fu She uniquely corresponds to a character set, but a character set can correspond to a variety of Fu She, one of which is the default word Fu She (defaults Collation);
The word Fu She name in MySQL follows the naming convention: start with the character Fu She corresponding to the charset name, _ci (for case insensitive), _cs (for case sensitivity), or _bin (which means comparison by encoded values). For example: Under the word Fu She "Utf8_general_ci", the characters "a" and "a" are equivalent;
MySQL Character set settings
• System Variables:
–character_set_server: Default internal operation character Set
–character_set_client: The character set used by the client source data
–character_set_connection: Connection Layer Character Set
–character_set_results: Query result Character Set
–character_set_database: The default character set for the currently selected database
–character_set_system: System metadata (field name, etc.) character set
– There are also variables that begin with Collation_ to describe the character Fu She.
• Specify the character set of the text string with introducer:
– Format: [_charset] ' string ' [COLLATE collation]
For example
SELECT _latin1 ' string ';
SELECT _utf8 ' Hello ' COLLATE utf8_general_ci;
– Text strings modified by Introducer are converted directly to internal character set processing without excessive transcoding during the request.
The character set conversion process in MySQL
1. When MySQL server receives the request, it converts the request data from character_set_client to character_set_connection;
2. Convert the request data from character_set_connection to the internal operation character set before doing an internal operation, and determine the following method:
-Use the character set value for each data field;
-If the above value does not exist, the default CHARACTER set value of the corresponding data table is used (MySQL extension, non-SQL standard);
-If the above value does not exist, the default CHARACTER set value of the corresponding database is used;
-If the above value does not exist, use Character_set_server to set the value.
3. Convert the operation result from the internal operation character set to Character_set_results.
We are now back to analyze the garbled problem we have:
A our field does not have a character set, so using a table's dataset
b Our table does not specify a character set, which uses the database stored character set by default
C Our database does not specify a character set when it is created, so use Character_set_server to set the value
D We didn't deliberately modify the specified character set for character_set_server, so using MySQL default
e MySQL default character set is Latin1, therefore, we use the latin1 character set, and our character_set_connection character set is UTF-8, inserting Chinese garbled also inevitably.
FAQ Resolution
faq-1 inserting UTF8 encoded data into a data table with a default character set of UTF8, setting the connection character set to UTF8 when queried
– The Character_set_client, Character_set_connection, and Character_set_results are latin1 when plugged in according to the MySQL server's default settings;
– The data for the insert operation will undergo the LATIN1=>LATIN1=>UTF8 character set conversion process, in which each inserted kanji will be saved from the original 3 bytes to 6 bytes;
– The results of the query will undergo the UTF8=>UTF8 character set conversion process, returning the saved 6 bytes intact, resulting in garbled characters. Reference:
• Before inserting UTF8 encoded data into a data table with the default character set of Latin1, the connection character set is UTF8 (The mistake we encountered was this one.)
– Character_set_client, Character_set_connection, and character_set_results are all UTF8 when inserted according to the connection character set;
--the insertion data will be converted to a utf8=>utf8=>latin1 character set, and if the original data contains Unicode characters outside the \U0000~\U00FF range, it will be converted to "?" because it cannot be represented in the latin1 character set. (0x3f) symbol, which cannot be restored at a later time, regardless of the connection character set settings. The conversion process is as follows:
some means of detecting character set problems
show CHARACTER SET;
show COLLATION;
show VARIABLES like ' character% ';
show VARIABLES like ' collation% ';
SQL function Hex, LENGTH, Char_length
SQL function CharSet, COLLATION
recommendations when using the MySQL character set
• Make the database/table and database operations as explicit as possible when using the character set, rather than relying on MySQL's default settings, or the MySQL upgrade may cause a lot of trouble;
• The database and the connection character set both use Latin1, although in most cases can solve the garbled problem, but the disadvantage is that it is not possible to do SQL operations in a character unit, generally the database and the connection character set are UTF8 is a better choice;
• When using MySQL CAPI (the API for MySQL to provide a C-language operation), use Mysql_options to set the Mysql_set_charset_name property to UTF8 immediately after initializing the database handle, so that you do not have to explicitly use the set The names statement specifies the connection character set and resets the connection character set to UTF8 when the long connection is broken with mysql_ping;
• For MySQL PHP API, the General page-level PHP program has a short running time, after connecting to the database explicitly set the SET NAMES statement once the connection character sets, but when using a long connection, please keep the connection smooth and after disconnecting with set The names statement explicitly resets the connection character set.
Other Precautions
The Default_character_set setting in my.cnf only affects the connection character set when the MySQL command connects to the server, and does not have any effect on applications that use the Libmysqlclient library!
• SQL function operations on fields are usually performed in the internal operating character set and are not affected by the connection character set settings.
The bare strings in the SQL statement are affected by the connection character set or the introducer settings, and for operations such as comparisons, you may have completely different results, and you need to be careful!
Summary
According to the above analysis and suggestions, we solve the problem we should use what method we should be more clear in mind. Yes, when creating the database, specify the character set, do not go through the modification of the default configuration to achieve the purpose, of course you can also use the form of the specified table character set, but it is easy to omit, especially when many people are involved in the design, more prone to fault.
Although not advocated by modifying the MySQL default character set to solve, but for how to modify the default character set, I still give some methods, for your reference only.
MySQL default character set
MySQL specifies that the character set can be refined to a single database, a table, and a column. Traditional programs do not use a complex configuration when creating databases and data tables, they use the default configuration.
(1) When compiling MySQL, a default character set is specified, and this character set is latin1;
(2) When installing MySQL, you can specify a default character set in the configuration file (My.ini), and if not specified, this value is inherited from the compile-time specified;
(3) When starting mysqld, you can specify a default character set in the command line arguments, if not specified, this value inherits from the configuration in the configuration file, at this time Character_set_server is set to this default character set;
(4) Install MySQL Select Multi-language support, the installer will automatically set the Default_character_set in the configuration file to UTF-8, to ensure that by default all the tables of all the columns of the database with UTF-8 storage.
viewing the default character set
(by default, MySQL's character set is Latin1 (Iso_8859_1), and how to view it in the above we have given the relevant commands
Modifying the default character set
(1) The simplest method of modification is to modify the character set key values in the MySQL My.ini file,
such as Default-character-set = UTF8
Character_set_server = UTF8
After modifying, restart the MySQL service
(2) There is also a way to modify the character set, that is, the command to use MySQL
mysql> SET character_set_client = UTF8;
mysql> SET character_set_connection = UTF8;
mysql> SET character_set_database = UTF8;
mysql> SET character_set_results = UTF8;
mysql> SET character_set_server = UTF8;
mysql> SET collation_connection = UTF8;
mysql> SET collation_database = UTF8;
mysql> SET collation_server = UTF8;
The default character set for the table is UTF8 and the query is sent by UTF-8 encoding, and the database is still garbled. There may be a problem on the connection connection layer. The workaround is to execute the following sentence before sending the query: SET NAMES ' UTF8 '; it corresponds to the following three-sentence instruction:
SET character_set_client = UTF8;
SET character_set_results = UTF8;
SET character_set_connection = UTF8;
MySQL Character set settings