As long as the text is involved, there will be character sets and encoding methods. For the MySQL database system, you can input an SQL statement from the MySQLclient, using TCPIP
As long as the text is involved, there will be character sets and encoding methods. For the MySQL database system, you can input an SQL statement from the MySQL client through TCP/IP
Character Set & character encoding
Character set is a collection of multiple characters. There are many types of Character sets. Each Character set contains different numbers of characters. The characters here can be English or Chinese characters, or other Chinese characters.
Common Character sets include: ASCII character set, LATIN1 character set, GB2312 Character Set, GBK character set, GB18030 character set, and Unicode Character Set. The character encoding method is to use one or more bytes to represent a character in the character set. Each character set has its own unique encoding method. Therefore, different binary codes are generated for the same character set in different encoding methods. ASCII is a set of character sets based on the Roman alphabet. It uses a 1-Byte Low 7 characters to indicate characters, and the high position is always 0. Compared with the ASCII character set, the LATIN1 character set still uses one byte to represent characters, but the high level is enabled to extend the character set's representation range. The GB2312, GBK, and GB18030 character sets support Chinese characters. The Character Set range is GB2312.
MySQL and Character Set
As long as the text is involved, there will be character sets and encoding methods. For the MySQL database system, you can input an SQL statement from the MySQL client, pass it to the MySQL server process through TCP/IP, and finally store the file to the server, character storage is involved in each step. Character Set encoding is involved in character storage. The system variables provided by MySQL can be seen. The following table lists the system variables and meanings of MySQL Character Set settings:
Variable name
Description
Character_set_server
Default internal operation Character Set
Character_set_client
Character set used by client source data
Character_set_connection
Connection layer Character Set
Character_set_results
Query Result Character Set
Character_set_database
Default Character Set of the currently selected Database
Character_set_system
System metadata (field name, etc.) Character Set
How these parameters work
1. Origins of library, table, and column Character Set
(1) If the character set is not explicitly specified during database creation, the character set specified by character_set_server is used.
(2) If the character set is not explicitly specified during table creation, the character set used in the current database is used.
(3). New. When modifying table fields, if the character set is not explicitly specified, the character set used in the current table is used.
2. Updated and queried character set variables
When users update (insert, delete, modify) and query databases, the most commonly used character set variables mainly include character_set_client, character_set_connection, character_set_result.
Update process Character Set conversion process: character_set_client-character_set_connection-Table character set.
Query Process Character Set conversion process: Table Character Set-character_set_result
PS: I personally think the character_set_connection connection Character Set settings are somewhat redundant, because they are eventually converted to the table character set.
3. character_set_database
This parameter is the character set of the current default database. For example, after use xxx is executed, the current database changes to xxx. If the character set of xxx is utf8, the variable value is utf8. Therefore, this parameter is set by the system and does not need to be manually set.
Mysql character encoding and conversion process
If the preceding system variables are set differently, for example, character_set_client is UTF8 and character_set_database is GBK, encoding conversion is required. So what is the principle of Character Set conversion? Assume that the GBK character set's character string "James" needs to be converted to UTF8 Character Set storage. Actually, for each Chinese Character in the "James" string, go to the UTF8 encoding table to query the corresponding binary, and then store it, encoding conversion does not involve complex algorithms. Mysql Character Set conversion involves the following steps:
1) convert data from character_set_client settings to character_set_connection settings;
2) convert character_set_connection to Character Set settings of table fields;
3) convert the operation result from the table field character set to character_set_results settings.
Next I will describe the character set conversion process through a common scenario. Run the mysql Command Line (for remote connection: SecureCRT) and run "insert into T values (1, 'xiaoming ')", the string 'xiaoming 'stores binary content during the transfer process.
A) The user uses the utf8 character set as the client, character_set_client = gbk, character_set_connection = gbk, and table T uses the gbk character set.
Because character_set_client, character_set_connection, and the table character set are both GBK, encoding conversion is not involved. Therefore, although the table is a character set, although it is GBK, the "James" encoding is not a binary stream of GBK encoding, but a binary stream of UTF8. The two Chinese characters occupy 6 bytes, reading is a reverse process that does not involve encoding conversion. The query still returns "James" correctly ".
B) In case of a), change character_set_client to utf8 to query the inserted value.
We can see that the returned value is "", because the table's character set is GBK, and the client request is UTF8, then the server converts the GBK Chinese character "" corresponding to the binary stream E5B08FE6988E to the binary stream E7818FE5BF94E6A791 corresponding to the UTF8 Chinese character. Therefore, the query result is displayed as "" in SecureCRT ", this is what we call garbled characters.
C) in B), set the SecureCRT character set to GBK to see the effect of SecureCRT Character Set settings on the result.
We can see that the returned character is another group of characters "audio streams Transfer Character". The entire transfer process is the same as that of B), but in the first step, the byte stream conversion occurs, and the SecureCRT character set encoding is set, the display mode is changed.
SQL statements related to character sets
1) view character set encoding settings
'
2) Set character set encoding
Set names xxx;
This statement is equivalent to setting the character set of the client. It mainly contains three system variables: character_set_client, character_set_connection, and character_set_results.
3) modify the database Character Set
XXX;
This statement only modifies the character set of the database, which affects the default definition of the tables subsequently created. The character set of the created tables is not affected.
4) modify the character set of a table
XXX;
This statement only modifies the character set of the table and affects the default definition of new columns in the table. The character set of existing columns is not affected.
XXX;
This statement modifies both the table Character Set and the existing column character set, and encodes and converts the existing data.
5) modify the column Character Set
Xxx
6) query the binary encoding of Characters
Select hex (COL_NAME) FROM TABLE_NAME; select length (COL_NAME) FROM TABLE_NAME;
For a GBK table, if a character occupies 3 bytes, the ratio is 1, the character set must be set in a specific stage, because the client is UTF8, both mysqlclient and database are caused by GBK.
Default mysql Character Set latin1