MySQL Character Set knowledge Summary

Source: Internet
Author: User

MySQL Character Set knowledge Summary

Character Set & character encoding

Character set is a collection of multiple characters. There are many types of Character sets. Each Character set contains different numbers of characters. The characters here can be English or Chinese characters, or other Chinese characters.

Common Character sets include: ASCII character set, LATIN1 character set, GB2312 Character Set, GBK character set, GB18030 character set, and Unicode Character Set. The character encoding method is to use one or more bytes to represent a character in the character set. Each character set has its own unique encoding method. Therefore, different binary codes are generated for the same character set in different encoding methods. ASCII is a set of character sets based on the Roman alphabet. It uses a 1-Byte Low 7 characters to indicate characters, and the high position is always 0. Compared with the ASCII character set, the LATIN1 character set still uses one byte to represent characters, but the high level is enabled to extend the character set's representation range. The GB2312, GBK, and GB18030 character sets support Chinese characters. The Character Set range is GB2312 <GBK <GB18030. The characters in the GBK character set are encoded in two bytes. For-7 F, the characters are consistent with ASCII characters, and the Chinese characters are expressed in two bytes. The first byte ranges from 81-FE to avoid conflicts with 00-7F. The Unicode Character Set is an industry standard in the computer science field and supports text characters in all countries. Unicode Character Set has several encoding methods, such as common UTF-8, UTF-16, utf-32 and so on. Utf8 uses 1-4 bytes to indicate characters, UTF-16 uses fixed 2 bytes, and utf-32 uses 4 bytes to store.

MySQL and Character Set

As long as the text is involved, there will be character sets and encoding methods. For the MySQL database system, you can input an SQL statement from the MySQL client, pass it to the MySQL server process through TCP/IP, and finally store the file to the server, character storage is involved in each step. Character Set encoding is involved in character storage. The system variables provided by MySQL can be seen. The following table lists the system variables and meanings of MySQL Character Set settings:

Variable name

Description

Character_set_server

Default internal operation Character Set

Character_set_client

Character set used by client source data

Character_set_connection

Connection layer Character Set

Character_set_results

Query Result Character Set

Character_set_database

Default Character Set of the currently selected Database

Character_set_system

System metadata (field name, etc.) Character Set

How these parameters work

1. Origins of library, table, and column Character Set
(1) If the character set is not explicitly specified during database creation, the character set specified by character_set_server is used.
(2) If the character set is not explicitly specified during table creation, the character set used in the current database is used.
(3). New. When modifying table fields, if the character set is not explicitly specified, the character set used in the current table is used.

2. Updated and queried character set variables

When users update (insert, delete, modify) and query databases, the most commonly used character set variables mainly include character_set_client, character_set_connection, character_set_result.
Update process Character Set conversion process: character_set_client-character_set_connection-Table character set.
Query Process Character Set conversion process: Table Character Set-character_set_result

PS: I personally think the character_set_connection connection Character Set settings are somewhat redundant, because they are eventually converted to the table character set.

3. character_set_database

This parameter is the character set of the current default database. For example, after use xxx is executed, the current database changes to xxx. If the character set of xxx is utf8,Then the variable value is utf8. Therefore, this parameter is set by the system and does not need to be manually set.

Mysql character encoding and conversion process

If the preceding system variables are set differently, for example, character_set_client is UTF8 and character_set_database is GBK, encoding conversion is required. So what is the principle of Character Set conversion? Assume that the GBK character set's character string "James" needs to be converted to UTF8 Character Set storage. Actually, for each Chinese Character in the "James" string, go to the UTF8 encoding table to query the corresponding binary, and then store it, encoding conversion does not involve complex algorithms. Mysql Character Set conversion involves the following steps:

1) convert data from character_set_client settings to character_set_connection settings;

2) convert character_set_connection to Character Set settings of table fields;

3) convert the operation result from the table field character set to character_set_results settings.

Next I will describe the character set conversion process through a common scenario. Run the mysql Command Line (for remote connection: SecureCRT) and run "insert into T values (1, 'xiaoming ')", the string 'xiaoming 'stores binary content during the transfer process.

A) The user uses the utf8 character set as the client, character_set_client = gbk, character_set_connection = gbk, and table T uses the gbk character set.

 

Because character_set_client, character_set_connection, and the table character set are both GBK, encoding conversion is not involved. Therefore, although the table is a character set, although it is GBK, the "James" encoding is not a binary stream of GBK encoding, but a binary stream of UTF8. The two Chinese characters occupy 6 bytes, reading is a reverse process that does not involve encoding conversion. The query still returns "James" correctly ".

B) In case of a), change character_set_client to utf8 to query the inserted value.

 

We can see that the returned value is "", because the table's character set is GBK, and the client request is UTF8, then the server converts the GBK Chinese character "" corresponding to the binary stream E5B08FE6988E to the binary stream E7818FE5BF94E6A791 corresponding to the UTF8 Chinese character. Therefore, the query result is displayed as "" in SecureCRT ", this is what we call garbled characters.

C) in B), set the SecureCRT character set to GBK to see the effect of SecureCRT Character Set settings on the result.

 

We can see that another group of characters "audio streams and audio streams" is returned. The entire transfer process is the same as that of B), but in the first step, the byte stream conversion occurs, and SecureCRT character set encoding is set, the display mode is changed.

SQL statements related to character sets

1) view character set encoding settings

SHOW VARIABLES LIKE ‘%CHARACTER%’

2) Set character set encoding

SET NAMES xxx;

This statement is equivalent to setting the character set of the client. It mainly contains three system variables: character_set_client, character_set_connection, and character_set_results.

3) modify the database Character Set

ALTER DATABASE  DATABASENAME  CHARACTER SET XXX;

This statement only modifies the character set of the database, which affects the default definition of the tables subsequently created. The character set of the created tables is not affected.

4) modify the character set of a table

ALTER TABLE TABLENAME CHARACTER SET XXX;

This statement only modifies the character set of the table and affects the default definition of new columns in the table. The character set of existing columns is not affected.

ALTER TABLE TABLENAME CONVERT TO CHARACTER SET XXX;

This statement modifies both the table Character Set and the existing column character set, and encodes and converts the existing data.

5) modify the column Character Set

ALTER TABLE `TABLE_NAME` MODIFY COLUMN `COLUMN_NAME`  CHARACTER SET xxx

6) query the binary encoding of Characters

SELECT HEX(COL_NAME) FROM TABLE_NAME;SELECT LENGTH(COL_NAME) FROM TABLE_NAME;

For a GBK table, if a character occupies 3 bytes and the ratio is 1, the character set must be set in a specific stage. Figure 1 shows that the client is UTF8, both mysqlclient and database are caused by GBK.

Default mysql Character Set latin1

Mysql 4.x uses the latin1 character set (also known as ISO-8859-1) by default, latin1 character set encoding method using a single byte encoding. Let's leave a question: is there a problem in writing and reading Chinese characters to the latin1 character set table? The answer is that there is no problem with proper settings. Assume that SecureCRT is UTF8, character_set_client, and the table character set are both set to latin1. For more information, see section 3rd. In this case, character set encoding conversion is not involved in data reading and writing, convert UTF8 Chinese characters into binary streams and write them to the database. After extraction, secureCRT decodes the corresponding binary data into the corresponding Chinese characters, so this does not affect user usage. However, if character_set_client and character_set_connection are inconsistent with the table Character Set settings, garbled characters may occur.

This article permanently updates the link address:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.