One, Character set (Character set)
is a collection of more than one character (English characters, kanji characters, or other national languages ), with a variety of character sets, and each character set contains a different number of characters.
Characteristics:
① character encoding is a character in a character set that is represented by one or more bytes
② Each character set has its own unique encoding, so the same character , encoded in different character sets, produces a different binary
Common character Sets:
ASCII Character set: a set of character sets based on the Roman alphabet, which uses a 1-byte low 7-bit representation of the character, and the high position is always 0.
LATIN1 Character Set: extended with respect to the ASCII character set, a byte is still used to represent the character, but a high level is enabled, extending the representation of the character set.
GBK Character Set: supports Chinese characters with one-byte encoding and two-byte encoding.
UTF8 Character Set: One of the Unicode character sets is an industry standard in the field of computer science that supports the characters of all countries, and UTF8 uses 1-4 bytes to represent characters.
1. MySQL and character set:
the character set and encoding will exist wherever the text is involved. mysql system variable value:
2. Correct use of character set
The character set of the database service side depends on what characters are stored
How the above parameters work:
1. The origin of library, table, and column character sets
① If a character set is not explicitly specified when building a library, the character set specified by Character_set_server is used.
② when a table is built, the character set used by the current library is used if the character set is not explicitly specified.
When ③ is added, the character set used in the current table is used if the character set is not explicitly specified when modifying a table field.
2. Update and query related character set variables
Update process Character set conversion process:character_set_client-->character_set_connection--> table character set.
Query flow Character set conversion process: Table Character Set-->character_set_result
3.character_set_database
The current default database character set, such as the execution of the use of XXX, the current database becomes xxx, if XXX's character set is UTF8, then this variable value becomes UTF8 (for system settings, no manual setting).
3. mysql client and character set
1. For input:
The character set used by the client must be manifested through character_set_client, character_set_connection :
① encode data on the client (Linux:utf8, WINDOWS:GBK)
②mysql after receiving the SQL statement (for example, insert), a character is found, asking the client how to encode the character: the client informs the MySQL client of the encoding by the Character_set_client parameter (so this parameter needs to correctly reflect the corresponding encoding of the client )
③ the client's character set is converted to the connection character set when MySQL discovers that the character set transmitted by the clients ' connection is different from their own.
④mysql stores the converted encoding on a column in the MySQL table and, when stored, determines whether the encoding is consistent with the encoding on the internal storage character set (according to the priority of the character set type), and if the inconsistency needs to be converted again
2. For queries:
The character set used by the client must be represented by Character_set_results , the server asks the client character set, and through character_set_results translates the result to the same character set as the client. (character_set_results default equals Character_set_client)
4, MySQL character encoding conversion principle:
Q: If Character_set_client is UTF8, and Character_set_database is GBK, what is the principle of character set conversion if the encoding conversion is required?
A: Assuming the string "hello" of the GBK character set, you need to switch to UTF8 character set storage, which is actually for each character in the "Hello" string to utf8 the encoding table inside the corresponding binary, and then stored.
Schematic character set conversion process:
①mysql server converts the request data from Character_set_client to Character_set_connectionwhen the request is received;
② Convert request data from character_set_connection to internal operation character set before internal operation
To determine the steps:
--Use the character set set value for each data field;
-If the above value does not exist, the default CHARACTER set value of the corresponding data table is used;
-If the above value does not exist, the default CHARACTER set value of the corresponding database is used;
-If the above value does not exist, use the Character_set_server setting value;
③ converts the operation result from the internal operation character set to Character_set_results.
5, String common processing operation
1. View Character set encoding settings
Mysql> Show variables like '%character% ';
2. Setting Character Set encoding
mysql> set names ' UTF8 ';
Equivalent at the same time:
Set character_set_client = UTF8;
Set character_set_results = UTF8;
Set character_set_connection = UTF8;
3. Modifying the database character set
mysql> ALTER DATABASE database_name character set XXX;
Modifies only the library's character set, affects the default definition of the table that is created later, and is not affected for the created table's character set. (The character set is typically implemented in a database, and tables and columns default to the database's character set )
4. Modifying the character set of a table
Mysql> ALTER TABLE table_name character set XXX;
Only the character set of the table is modified, which affects the default definition of the columns that are added to the table, and the character set for the existing columns is unaffected.
Mysql> ALTER TABLE table_name convert to character set xxx;
both the table character set and the existing column character set are modified and the existing data is converted to character set encoding.
5. Modifying the column character set
Format:
ALTER TABLE table_name MODIFY
column_name {CHAR | VARCHAR | TEXT} (Column_length)
[CHARACTER SET Charset_name]
[COLLATE Collation_name]
Mysql> ALTER TABLE table_name modify COL_NAME varchar (col_length) character set xxx;
6, the correct practice of the character set:
The MySQL software tool itself does not have a character set, mainly because the tool's OS character set (WINDOWS:GBK, Linux:utf8), so the correct practice of the character set is very important:
1. For insert, character_set_client, character_set_connection are the same, and correctly reflect the character set used by the client
2. For select, Character_set_results correctly reflects the client character set
3. The database character set depends on the type of character we want to store
4. Character set conversions occur at most once, which requires the same character_set_client, character_set_connection
5. All character set conversions occur on the database side
Review:
1, the establishment of the database when attention to the character set (GBK, UTF8);
2. after connecting to a database, whether you are executing DML or SELECT, you need to set the correct character set parameters as long as the varchar and char columns are involved .
second, character set and proofreading rules collation proofreading
View all character sets supported by the database (CharSet): is a set of symbols and corresponding numbers
Mysql> show Character set;
Proofing Rules (Collation):
is a set of rules for character comparisons and sorting within a character set , such as rules that are case-sensitive and others ignored.
mysql> CREATE TABLE t1 (ID int,name varchar); #t1建表没有指定校对规则
Mysql> show collation; #查看数据库支持的所有校对规则
Mysql> Show variables like ' collation_% '; #查看当前字符集和校对规则设置
Proofing Rule Features:
① Two different character sets cannot have the same proofing rules ;
② Each character set has a default proofing rule;
③ There is a collation rule naming convention : start with its associated character set name , include a language name in the Middle , and be _ci (case insensitive), _cs (case sensitive), or _ Bin (two Yuan) ended .
Attention:
The system uses the UTF8 character set, which is case-sensitive when executing SQL queries using Utf8_bin proofing rules, and utf8_general_ci is case-insensitive (the default UTF8 character set corresponds to the collation Rule utf8_general_ci).
Example:
Mysql> CREATE TABLE t2 (ID int,name varchar) character SET=GBK Collate=gbk_bin; #t2建表指定校对规则 (case sensitive)
MySQL Character set-proofing rules