MySQL Character set GBK, GB2312, UTF8 to solve the problem of MySQL Chinese garbled _mysql

Source: Internet
Author: User
Tags mysql in create database mysql command line
several character sets involved in MySQL

Character-set-server/default-character-set: Server character set, by default.
Character-set-database: Database character set.
Character-set-table: Database table character Set.
Priority increases in order. Therefore, in general, you only need to set the character-set-server, while creating the database and tables do not specifically specify the character set, so that the unified use of the Character-set-server character set.
Character-set-client: The character set of the client. The client default character set. When a client sends a request to the server, the request is encoded with the character set.
Character-set-results: Result character Set. When the server returns results or information to the client, the result is encoded in that character set.
At the client, if no character-set-results is defined, the character-set-client character set is used as the default character set. So you just need to set up the character-set-client character set.

To handle Chinese, you can set both Character-set-server and Character-set-client to GB2312, and if you want to handle multiple languages at the same time, set to UTF8.

questions about MySQL in Chinese

To solve the garbled method, the following three system parameters are set to the same character set as the server character set character-set-server before executing the SQL statement.
Character_set_client: The character set of the client.
Character_set_results: Result character Set.
Character_set_connection: Connection character Set.
Set these three system parameters by sending a statement to MySQL: Set names gb2312

about GBK, GB2312, UTF8

Utf-8:unicode transformation Format-8bit is allowed with a BOM, but usually does not contain a BOM. is a multi-byte encoding used to solve international characters, which uses 8 bits in English (that is, one byte), and the Chinese uses 24 for (three bytes) to encode. UTF-8 contains the characters that all countries in the world need to use, is international code, strong universality. UTF-8 encoded text can be displayed on browsers that support UTF8 character sets in countries. For example, if the UTF8 code, the foreigner in English IE can also display Chinese, they do not need to download IE Chinese Language support package.

GBK is the standard of GB2312 compatible GB2312 on the basis of national standard. GBK's text encoding is expressed in two-byte notation, that is, both Chinese and English characters are expressed in double-byte notation, and the highest bits are set to 1 in order to differentiate Chinese. GBK contains all the Chinese characters, is the country code, the universality is worse than the UTF8, but UTF8 occupies the database to be bigger than the GBD.

GBK, GB2312, and UTF8 must all be converted to each other through Unicode encoding:
GBK, Gb2312--unicode--utf8
UTF8--UNICODE--GBK, GB2312

For a website, forum, if English characters are more, it is recommended to use UTF-8 to save space. But now many forum plug-ins generally only support GBK.

GB2312 is a subset of GBK, GBK is a subset of GB18030.
GBK is a large set of characters including Chinese, Japanese and Korean
If it is a Chinese website recommend GB2312 GBK Sometimes there is a problem
In order to avoid all garbled problems, should adopt UTF-8, in the future to support internationalization is also very convenient
UTF-8 can be thought of as a large character set, which contains the encoding of most of the text.
One advantage of using UTF-8 is that users in other areas (such as Hong Kong, Taiwan) will be able to view your text without any garbled characters without installing Simplified Chinese support.

GB2312 is a code in Simplified Chinese
GBK supports Simplified Chinese and Traditional Chinese
Big5 supports Traditional Chinese
Utf-8 supports almost any character

first, analyze the situation of garbled
1. Write as garbled when writing to the database
2. Query results with garbled return
What is the situation when garbled characters occur?
Let's first type in the MySQL command line
Show variables like '%char% ';
To view MySQL character set settings:

Mysql> Show variables like '%char% ';
+--------------------------+----------------------------------------+
| variable_name | Value |
+--------------------------+----------------------------------------+
| character_set_client | GBK |
| character_set_connection | GBK |
| Character_set_database | GBK |
| Character_set_filesystem | binary |
| Character_set_results | GBK |
| Character_set_server | GBK |
| Character_set_system | UTF8 |
| Character_sets_dir | /usr/local/mysql/share/mysql/charsets/|
+--------------------------+----------------------------------------+

In the query results can be seen in the MySQL database system client, database connection, database, file system, query
Results, server, system set of character sets
In this case, the file system character set is fixed, the system, the server's character set at the time of installation, and the garbled problem is irrelevant
The problem of garbled characters is related to the setting of the client, database connection, database, query result
* Note: The client is to see the way to access the MySQL database, access through the command line, the command line window is the client, pass
JDBC Connection access, the program is the client
We write the Chinese data to MySQL, the client, database connection, write to the database, respectively, to encode the
Change
When the query is executed, the return result, the database connection, and the client encode the conversion
Now we should be aware that garbled characters occur in databases, clients, query results, and database connections, one or more of these
A link
And then we're going to solve this problem.
When we log in to the database, we connect using the MySQL--default-character-set= character set-U root-p, when we
Then use show variables like '%char% '; command to view character set settings, you can find clients, database connections,
The character set of the query result has been set to the character set that was selected at logon
If you are already logged in, you can use the set names character set to achieve this effect, equivalent to the following command:
Set character_set_client = Character Set
Set character_set_connection = Character Set
Set character_set_results = Character Set
If you are connecting to a database through JDBC, you can write a URL like this:
url=jdbc:mysql://localhost:3306/abs?useunicode=true&characterencoding= Character Set
JSP page and other terminals to set the corresponding character set
The database's character set can modify the MySQL startup configuration to specify the character set, or you can add it to the CREATE database
Default character set character set to enforce the character set of the database
Through this setting, the entire data write read out process in the unified character set, there will be no garbled
Why write Chinese directly from the command line without setting or garbled?
It is clear from the command line, the client, database connection, query results of the character set has not changed
The input of Chinese after a series of transcoding and back to the original character set, we look at what we see is not garbled
But that does not mean that Chinese are stored correctly in a database as a medium character.
For example, now there is a UTF8 encoded database, the client connection uses GBK encoding, connection uses the default
Iso8859-1 (that is, the latin1 in MySQL), we send the "Chinese" string on the client, the client
Sends a string of GBK format binaries to the connection layer, the connection layer in ISO8859-1 format
Binary code sent to the database, the database will be stored in the UTF8 format, we will this field to UTF8
Format read out, it must be garbled, that is, Chinese data in writing to the database is in garbled form of storage,
When querying with the same client, a set of actions opposite to writing is done and the wrong UTF8 format binary
The code is then converted to the correct GBK code and displayed correctly.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.