Character encoding system of MySQL (I)-data storage coding and mysql Data Storage

Source: Internet
Author: User

Character encoding system of MySQL (I)-data storage coding and mysql Data Storage

MySQL has been installed many times, and every time it will be entangled in the character encoding configuration of the database, so I decided to thoroughly clarify it this time.

The character encoding structure of MySQL is relatively small. It consists of two parts: Data Storage encoding and data transmission encoding. This article discusses data storage encoding, which is discussed in the next MySQL character encoding system (II)-Data Transmission Encoding.


Encoding level

The character encoding configuration of data storage specifies the default character encoding used for data stored in the database. The default character encoding settings are divided into four levels: server level, database level, data table level, and column level. That is to say, you can set a default character encoding for the server, set different default encoding for each database on the server, and set different default encoding for each data table in the same database, set different default codes for each column in the same data table.


Logical Structure of the MySQL database server

Then how does the encoding settings of these four layers work? If no character encoding is specified during database creation, the server encoding is set by default. If no encoding is specified during data creation, the database encoding is set by default; if you do not specify the encoding of certain columns when adding new columns to a data table or creating a data table, these columns are set to the data table encoding by default. Note that the four levels of encoding exist as the "default". The encoding directly specified when a user creates a database, table, or adds a column is the highest priority.

On the other hand, directly changing the four levels of encoding does not change the current encoding of all their lower-level objects. For example, if the modification is only at the Server level, the default encoding of all existing databases remains the same, and the character encoding of data tables, columns, and each row of existing data records remains unchanged, however, if you create a new database without specifying its default encoding, its default encoding will be set to the Server's default encoding. Even if you modify all four levels of encoding, however, the character fields of each existing record in the data table are still stored according to the original encoding. However, if a new record is inserted into the data table, the database stores the character fields of the Record Based on the default encoding of the Current columns in the data table.

Setting Method

Modify the following SQL statements for encoding at the following levels on the Server:

ALTER {DATABASE | SCHEMA} [db_name] [DEFAULT] CHARACTER SET [=] charset_nameALTER TABLE dbl_name [DEFAULT] CHARACTER SET [=] charset_nameALTER TABLE dbl_name MODIFY [COLUMN] col_name {CHAR[(length)] | TEXT} CHARACTER SET charset_name

Note that the third modified column character encoding is actually implemented by completely redefining column attributes. The syntax is the same as specifying column field attributes when creating a new data table. Therefore, if you only want to modify the column character encoding, you must complete all the definition modifications used when creating the column.

You can directly modify the character_set_server variable during the runtime to modify the default Server encoding. However, this is temporary and will be automatically restored after the client is shut down and restarted. To change the default Server encoding permanently, set the variable value in the [mysqld] area of my. ini or my. cnf configuration file, and then restart the Server:

[mysqld]character_set_server=charset_name



In the mysql database table, Chinese characters are garbled. Which encoding should I choose?

There are many types of character sets in the database. I personally suggest using utf8 (UTF-8) as much as possible for the database character set so that your data can be migrated smoothly, because the utf8 character set is currently the most suitable character set for conversion between different character sets, although you cannot correctly view the content in the database on the command line tool, I strongly recommend that you use utf8 as the default character set. if you want to use gb2312 encoding, we recommend that you use latin1 as the default Character Set of the data table so that you can insert data directly in the command line tool in Chinese and display it directly. instead of using the gb2312 or gbk character sets, if you are worried about query sorting and other issues, you can use the binary Attribute constraints to mainly affect programming, client character sets and database character sets (there is also a server character set, do not know what to do), the common operations in the database are to save data and read data. In this process, chaos and garbled characters seem to have nothing to do with the database character set. We only need to ensure that the character set selected during the write operation is consistent with the character set selected during the read operation, that is, we only need to ensure that the client character set for the two operations is consistent.
The process of executing a query on a MySQL client is generally: enter an SQL statement at the client prompt, press enter, and the terminal displays the query result. In this process, only when the terminal and three MySQL System variables specify the correct character set can we ensure that we send a correct SQL statement to the server, and then the server returns the correct result, and displayed correctly on the terminal.
The three MySQL System variables are:
1. character_set_client, the terminal Character Set, tells the Server the encoding format of the SQL statement submitted by the client.
2. character_set_connection, connected character set, is the encoding format used by the server to translate SQL statements
3. character_set_results: Character Set of the returned result set, which is the encoding format used to convert the result set before the server returns the result set.
Run the show variables like 'Char % 'command on the MySQL terminal to view the values of these variables. These three variables are usually set to the same character set. You can use the set names [charset name] command to modify the values of these three variables. In general, as long as you set a character set that can represent your data, your query results can be correctly displayed on the terminal.
For example, the table t1 is UTF-8 encoded. The field c1 in the table inherits this encoding. The table is created as follows:
Mysql> create table t1 (c1 text not null) character set utf8;
The character used is the Chinese character "fan", gbk encoding is B7 B6, utf8 encoding is E8 8C 83
Insert data using the following SQL statement
Mysql> insert into t1 values ('ffan ');
A) if the terminal is set to utf8 and set names utf8 is executed, the UTF-8 code inserted to the database is the "fan" character. In this process, MySQL does not need to perform encoding conversion. The data written to the database can be verified by executing select hex (c1) from t1 to obtain the hexadecimal encoding of the data.

B) if the terminal is set to utf8 and set names gbk is executed, the binary data written after the insert operation is E9 91 BC, this is the UTF-8 encoding of the Chinese character gong. This is because the "fan" entered by the terminal uses UTF-8 encoding, while the server thinks that the content sent by the terminal is gbk encoding, therefore, a gbk to utf8 conversion is performed during the insertion into Table t1, and the result is of course incorrect.

C) if the terminal is set to gbk and set names gbk is executed, the UTF-8 encoding of the word "fan" is still written to t1 after the insert operation is completed. During the insertion process, the terminal entered the gbk code of "fan" B7 B6, and the server was notified to the terminal to issue the message... the remaining full text>

The data stored in mysql is garbled, but the character set encoding format is correct.

Enter mysql> set names utf8 in mysql;
You can.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.