Detailed description of character sets and Chinese characters

Source: Internet
Author: User


Detailed description of character sets and Chinese characters

Many posts on the Forum are about character sets and Chinese characters. I have carefully read and explained them in little detail. I have been studying MySQL for only a few days. Here I will explain my reading experience a little bit or two, hope it will be effective.


There is actually no complicated question about Chinese. Please note the following two details:

1) as long as it is gb2312, GBK, utf8, and other character sets that support multi-byte encoding, you can store Chinese characters. Of course, the number of Chinese Characters in gb2312 is much smaller than that in GBK, while in gb2312, GBK and so on can all be encoded in utf8. Therefore, to store Chinese characters in a data table, you only need to select one character set that can store Chinese characters as needed.

When creating a data table, if the character set of the database does not support the collection, you can add character set gb2312/GBK/utf8 to the create table, otherwise, the new table uses the same character set as the database.

Another great thing about MySQL is that you can specify the character set for the column, Hoho ~~ I like

In short, use show create table table_name; to check whether the character set of your data table supports Chinese Storage

2) As for front-end display and Chinese Input, I don't need to use PHP (so I don't need client software), so I will only describe the client under cmd as an example, if you want to configure it like this on other clients, please advise if there are any mistakes or omissions.

Many posts have said that the client character set must be the same as the server Character Set and database character set. In fact, it does not need to be so complicated.

In any character set status, We can insert Chinese characters to data tables that can store Chinese characters, you only need to add an Introducer before the Chinese to be inserted to indicate that the Chinese encoding method is acceptable. Please refer:

Insertinto table_name (column_with_chinese) values (_ GBK 'Chinese to be inserted ');

In this command, _ GBK is a "Character Set introducer", which indicates that the subsequent 'Chinese to be inserted 'will adopt GBK encoding. Please pay attention to the differences between introducer and function convert (). This is for your reference.

In this way, no matter under any character set, you can ensure that the Chinese character is inserted correctly, or the text of any other character set.

Speaking of this, it may be too annoying for everyone to add an Introducer every time, so let's take a look at the MySQL settings to see how to simplify it.

Run the show variables like 'character/_ set/_ % 'command to view the current character set settings:

+ -------------------------- + -------- +

| Variable_name | value |

+ -------------------------- + -------- +

| Character_set_client | Latin1 |

| Character_set_connection | Latin1 |

| Character_set_database | utf8 |

| Character_set_filesystem | binary |

| Character_set_results | Latin1 |

| Character_set_server | utf8 |

| Character_set_system | utf8 |

+ -------------------------- + -------- +

(I set it to Latin1, which does not support multi-byte character sets)

Many people will encounter an error: data too long forcolumn 'xxx' at Row 1

The reason is that the character_set_connection character set is set. If we use introducer, this parameter is ignored. If we change this parameter to the same Chinese encoding method as the data table, such as GBK, then we can omit the introducer writing. The string passed to the server will be automatically encoded in this character set.

In this case, we use the SELECT command to view data in the data table, but we find garbled characters. Therefore, please note the character_set_results parameter, it indicates the character set used by the result set returned from the server segment to the client. Therefore, you only need to set this parameter to GBK: Set character_set_results = GBK; then we can normally view the Chinese content in the data table.

It can be seen that the two most critical parameters for a Chinese problem are the above two. Some other parameters are left for your reference.

If you think it is too cumbersome to set character sets separately, the simplest task is to use set names charset_name; to set all character sets on the client at one time, you can be lazy.


However, we still need to note that if we process several different character sets at the same time, it is still indispensable for Introducer!

This article is transferred from

Http:// Tid = 922 & extra = & authorid = 0 & page = 1

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.