First, the basic concept (quoted here http://www.laruence.com/2008/01/05/12.html)
1, given a series of characters, each character is given a numeric value to represent the corresponding character, this value is the character encoding (Encoding). For example, we give the character ' a ' a value of 0, give the character ' B ' a value of 1, then 0 is the encoding of the character ' a ';
2, given a series of characters and given the corresponding encoding, all of these characters and encoding pairs are composed of a set of character sets (Character set). For example, given a word list characters {' A ', ' B '}, {' A ' =>0, ' B ' =>1} is a character set;
3. Character Fu She (Collation) refers to the comparison rules between characters within the same character set;
4, to determine the character Fu She, in order to define what is equivalent to a character, and the size of the relationship between the characters;
5, each character Fu She unique corresponding to a character set, but a character set can correspond to a variety of characters Fu She, one of which is the default word Fu She (defaults Collation);
6. The word Fu She name in MySQL follows the naming convention: begins with the character set name of the Fu She corresponding to _ci (which is case insensitive), _cs (which is case sensitive), or _bin (which means comparison by encoded value). For example: Under the word Fu She "Utf8_general_ci", the characters "a" and "a" are equivalent;
Second, the noun explanation
1, character_set_client: Client data parsing, coded character set.
2, Character_set_connection: Connection layer Character set.
3, Character_set_server: The server internal operation character set.
4, Character_set_results: Query result character Set.
5, Character_set_database: The character set of the current database.
6, Character_set_system: System source data (field name, etc.) character set.
Note:
1, there is a collation_ beginning with the corresponding variable, used to describe the word Fu She.
2, the service-side coding, parsing, according to the previous link of the code to parse, according to the respective character set encoding.
3, Character_set_server is the operation character set of MySQL database memory. If you do not specify a character set for the database when you create the database, the Character_set_server character set is used as the default character set, and if you do not specify a character set for the table when you create the table, use the character_set_database character set as the default character set If the character set of the field is not specified when the field is created, the character set of the table is used as the default character set.
4, set names GBK, equivalent to set character_set_client,character_set_connection,character_set_results these three character sets.
Third, the character set encoding and parsing during data transmission
1. Client and encoding
We use JDBC to manipulate the data of the program, navicate operation Tools, operating system operations database these are considered to be clients. The encoding for client navicate is utf8,windows the default encoding is GBK. In general, UTF8-encoded Chinese accounts for three bytes, and GBK occupies two bytes (one byte is 8-bit binary, which is two hexadecimal).
navicate operation (UTF8) MySQL> Show variables like '%char% ';+--------------------------+----------------------------+| variable_name | Value |+--------------------------+----------------------------+| character_set_client | UTF8 | | character_set_connection | UTF8 | | Character_set_database | UTF8 | | Character_set_filesystem | binary | | Character_set_results | UTF8 | | Character_set_server | UTF8 | | Character_set_system | UTF8 | | Character_sets_dir | /usr/share/mysql/charsets/|+--------------------------+----------------------------+8rows in set MySQL> select Hex (' I'm Handsome ');+--------------------+| Hex (' I'm Handsome ') |+--------------------+| e68891e5be88e5b885 |+--------------------+1 row in Set
Operations on Windows (GBK) MySQL> Show variables like '%char% ';+--------------------------+----------------------------+| variable_name | Value |+--------------------------+----------------------------+| character_set_client | GBK | | character_set_connection | GBK | | Character_set_database | UTF8 | | Character_set_filesystem | binary | | Character_set_results | GBK | | Character_set_server | UTF8 | | Character_set_system | UTF8 | | Character_sets_dir | /usr/share/mysql/charsets/|+--------------------------+----------------------------+8rows in set MySQL> select Hex (' I'm Handsome ');+--------------------+| Hex (' I'm Handsome ') |+--------------------+| Ced2badccaa7 |+--------------------+1 row in Set
2. parsing Process
The A.sql statement is sent to the MySQL server via client-side encoding;
B.character_set_client the received data to decode, here decoding according to character_set_client encoding decoding, and finally according to its own character set encoding.
C.character_set_connection receives the encoding from the client, where character set conversions are performed. Note here S.decode (character_set_client). Encode (character_set_connection).
D.character_set_server This is the character set used internally by the server, if you add a character set to a field, the field character set is taken here. Here you receive the connection encoding for character set conversion. E.decode (character_set_connection). Encode (Character_set_server).
3. query Process
A.mysql server to character_set_results sent to the client, in fact, here you just know that the last time you came out from the server is encoded according to Character_set_results.
B. After sending to the client, decode it according to the client code. So if the Character_set_results and the client code are inconsistent, it will cause the query garbled.
PS: Here I create a GBK table with data inserted in it (self-constructed, with Chinese).
navicate operation (UTF8) MySQL>SELECT @ @character_set_results;+-------------------------+| @ @character_set_results |+-------------------------+| UTF8 |+-------------------------+1row in set MySQL> select Name_man from wsyy_marry where id = 1;+----------+| Name_man |+----------+| Heliguan |+----------+1row in set MySQL> SET @ @session. Character_set_results = 28; Query OK,0rows affected MySQL>SELECT @ @character_set_results;+-------------------------+| @ @character_set_results |+-------------------------+| GBK |+-------------------------+1row in set MySQL> select Name_man from wsyy_marry where id = 1;+----------+| Name_man |+----------+|?????? |+----------+1 Row in Set
Windows Operations (GBK) MySQL>SELECT @ @character_set_results;+-------------------------+| @ @character_set_results |+-------------------------+| GBK |+-------------------------+1row in set MySQL> select Name_man from wsyy_marry where id = 1;+----------+| Name_man |+----------+| Heliguan |+----------+1row in set MySQL> SET @ @session. Character_set_results = 33; Query OK,0rows affected MySQL>SELECT @ @character_set_results;+-------------------------+| @ @character_set_results |+-------------------------+| UTF8 |+-------------------------+1row in set MySQL> select Name_man from test.wsyy_marry where id = 1;+----------+| Name_man |+----------+| Choi Hygiene Qian? |+----------+1 Row in Set
Iv. Summary
1, Character set 33, representing utf8;28 for GBK character Set 33;
2, the character set appears garbled place maximum possibly in two places, Character_set_client and Character_set_results. If the encoding of these two places the client code inconsistency will be garbled. I'm telling you, it's possible that you can't save it.
3, there may be coding problems, if the Chinese string, latin1 decoding not Chinese, it will appear garbled. In other words, when encoding conversion may appear incompatible, latin1 encoding can be UTF8 compatible, and vice versa may appear "?? "Such a situation.
4, look down and honestly don't mess up the character_set_client these values. If we can keep all the utf8, that's sure to be fine.
V. Questions
Client-side encoding |
Client |
Connection |
Server |
Results |
Utf8 |
Gbk |
Gbk |
Gbk/utf8 |
Insert failed |
Utf8 |
Gbk |
Utf8 |
Gbk/utf8 |
inserting garbled characters |
Utf8 |
Utf8 |
Gbk |
Gbk/utf8 |
Normal insertion |
Utf8 |
Urf8 |
Utf8 |
Gbk/utf8 |
Normal insertion |
I did the following statistics, the client-side code and character_set_client encoding inconsistencies may appear garbled, there is also the possibility of data insertion can not be plugged in. I don't know why I can't insert a database. The following two situations are most likely to be garbled caused by the error.
1, incorrect string value: ' \xb6 ' for column ' Name_man ' at row 1.
2, SQLException Errorcom.mysql.jdbc.MysqlDataTruncation:Data truncation:data too long for column ' Name_man ' at row 1.
If the introduction of the objection or can have a more comprehensive understanding can be in the following message, we learn together.
Vi. references
1.http://www.jianshu.com/p/96ee5b2adef3
2.http://blog.csdn.net/kxcfzyk/article/details/37723367
3.http://www.laruence.com/2008/01/05/12.html
Learn more about MySQL data transfer coding principles