In-depth understanding of mysql Data Transmission coding principles, in-depth understanding of mysql
I. Basic concepts (Here we reference http://www.laruence.com/2008/01/05/12.html)
1. Given a series of characters, assign a value to each character and use a value to represent the corresponding character. This value is the character Encoding (Encoding ). For example, if 'A' is given A value of 0 and 'B' is given A value of 1, 0 is the encoding of 'A;
2. After a series of characters are given and the corresponding encoding is granted, a Set of all these characters and encoding pairs is the Character Set ). For example, if the given character list is {'A', 'B'}, {'A' => 0, 'B' => 1} is A character set;
3. Collation refers to the comparison rules between characters in the same character set;
4. Only after confirming the character order can an equivalent character set be defined and the size relationship between the characters be defined;
5. Each Character Sequence corresponds to one character set, but one character set can correspond to multiple character sequences, one of which is the Default Collation );
6. Names in the collation of MySQL follow naming conventions: names starting with character sets corresponding to the collation; names starting with _ ci (Case Insensitive) and _ cs (case sensitive) or end with _ bin (compare by encoding value. For example, in the collation "utf8_general_ci", "a" and "A" are equivalent;
II,Glossary
1. character_set_client: character set for parsing and encoding client data.
2. character_set_connection: Connection layer character set.
3. character_set_server: Internal Operation character set of the server.
4. character_set_results: Query Result character set.
5. character_set_database: Character Set of the current database.
6. character_set_system: Character Set of the system source data (such as field names.
Note:
1. There are also variables that begin with collation _ and correspond to the same face, used to describe the collation.
2. During server-side encoding and resolution, the server-side encoding is performed according to the encoding in the previous step and the encoding is performed according to their respective character sets.
3. character_set_server is the operational character set of the mysql database memory. If the character set of the database is not specified during database creation, character_set_server is used as the default character set. If the character set of the table is not specified during table creation, character_set_database is used as the default character set; if no character set is specified when a field is created, the character set of the table is used as the default character set.
4. set names gbk; it is equivalent to setting character_set_client, character_set_connection, and character_set_results character sets at the same time.
III,Character Set encoding and parsing during Data Transmission
1.Client and code
We use jdbc to operate data programs, navicate tools, and operating system database. The client navicate is encoded as utf8, and windows uses gbk by default. Generally, UTF-8 encoding occupies three Chinese characters, and gbk occupies two bytes (one byte is an 8-bit binary, that is, two hexadecimal ).
Navicate operation (utf8) mysql> show variables like '% char % '; + bytes + | Variable_name | Value | + bytes + | character_set_client | utf8 | character_set_connection | utf8 | character_set_database | utf8 | bytes | binary | utf8 | | character_set_server | utf8 | character_set_system | utf8 | character_sets_dir |/usr/share/mysql/charsets/| + rows + 8 rows in set mysql> select hex ('I am very handsome '); + -------------------- + | hex (' ') | + -------------------- + | E68891E5BE88E5B885 | + ------------------ + 1 row in set
On Windows (gbk) mysql> show variables like '% char % '; + bytes + | Variable_name | Value | + bytes + | character_set_client | gbk | character_set_connection | gbk | character_set_database | utf8 | bytes | binary | gbk | | character_set_server | utf8 | character_set_system | utf8 | character_sets_dir |/usr/share/mysql/charsets/| + rows + 8 rows in set mysql> select hex ('I am very handsome '); + -------------------- + | hex (' ') | + -------------------- + | CED2BADCCAA7 | + ------------------ + 1 row in set
2.Parsing process
A. the SQL statement is sent to the mysql server by client encoding;
B. character_set_client decodes the received data. Here, the data is decoded according to character_set_client encoding, and then encoded according to the character set.
C. character_set_connection receives the encoding from the client. Character Set conversion is performed here. Note that s. decode (character_set_client). encode (character_set_connection ).
D. character_set_server the character set used inside the server. If you add a character set to a field separately, the field character set is used here. The connection code is received here to convert the character set. E. decode (character_set_connection). encode (character_set_server ).
3.Query Process
A. the mysql server is converted to character_set_results and sent to the client. In fact, you only need to know That character_set_results encoding is used when the server comes out.
B. decode the data sent to the client according to the client encoding. Therefore, if character_set_results is inconsistent with the client encoding, the query will be garbled.
Ps: Here I create a gbk table with data inserted (self-built with Chinese characters ).
Navicate operation (utf8) mysql> select @ character_set_results; + rows + | @ character_set_results | + ------------------------- + | utf8 | + ------------------------- + 1 row in set mysql> select name_man from wsyy_marry where id = 1; + ---------- + | name_man | + ---------- + | Heli Guang | + ---------- + 1 row in set mysql> set @ session. character_set_results = 28; Query OK, 0 rows affected mysql> select @ character_set_results; + rows + | @ character_set_results | + ------------------------- + | gbk | + ------------------------- + 1 row in set mysql> select name_man from wsyy_marry where id = 1; + ---------- + | name_man | + ---------- + | �� | + ---------- + 1 row in set
Windows operations (gbk) mysql> select @ character_set_results; + rows + | @ character_set_results | + ------------------------- + | gbk | + ------------------------- + 1 row in set mysql> select name_man from wsyy_marry where id = 1; + ---------- + | name_man | + ---------- + | Heli Guang | + ---------- + 1 row in set mysql> set @ session. character_set_results = 33; Query OK, 0 rows affected mysql> select @ character_set_results; + ------------------------- + | @ character_set_results | + ------------------------- + | utf8 | + ----------------------- + 1 row in set mysql> select name_man from test. wsyy_marry where id = 1; + ---------- + | name_man | + ---------- + | � | + ---------- + 1 row in set
Iv. Summary
1. Character Set 33, indicating utf8; 28 representing gbk character set setting 33;
2. character_set_client and character_set_results are the largest places where character sets are garbled. If the encoding of the two locations is inconsistent with the client encoding, garbled characters will occur. Tell you, it may not be saved.
3. Encoding Problems may also occur later. If the Chinese character string cannot be decoded by latin1, garbled characters may occur. That is to say, there may be incompatibility During encoding conversion. latin1 encoding can be utf8 compatible, and vice versa "??" This is the case.
4. Do not randomly set character_set_client values after reading them. If we can keep all of them utf8, it will certainly be okay.
V. Questions
Client code |
Client |
Connection |
Server |
Result |
Utf8 |
Gbk |
Gbk |
Gbk/utf8 |
Insertion failed |
Utf8 |
Gbk |
Utf8 |
Gbk/utf8 |
Insert garbled characters |
Utf8 |
Utf8 |
Gbk |
Gbk/utf8 |
Normal insert |
Utf8 |
Urf8 |
Utf8 |
Gbk/utf8 |
Normal insert |
According to the following statistics, the client encoding is inconsistent with character_set_client encoding, which may cause insertion garbled characters or data insertion failures. I don't know why I cannot insert a database. In the following two cases, errors may be caused by garbled characters.
1. Incorrect string value: '\ xB6' for column 'name _ man' at row 1.
2. SQLException errorcom. mysql. jdbc. MysqlDataTruncation: Data truncation: Data too long for column 'name _ man' at row 1.
If you have any objection to the introduction or have a more comprehensive understanding, you can leave a message below for everyone to learn.
Vi. References
1. http://www.jianshu.com/p/96ee5b2adef3
2. http://blog.csdn.net/kxcfzyk/article/details/37723367
3. http://www.laruence.com/2008/01/05/12.html