MySQL Character set knowledge summary

Source: Internet
Author: User
Tags ming mysql command line

Character Set & character encoding method

The character set (Character set) is a collection of multiple characters, with a variety of character sets, and each character set contains a different number of characters, where the characters can be English characters, kanji characters, or other national language characters.
Common character sets include: ASCII character set, LATIN1 character set, GB2312 character set, GBK character set, GB18030 character set, Unicode character set, and so on. Character encoding is a character in a character set that is represented by one or more bytes. Each character set has its own unique encoding, so the same character, encoded in different character sets, produces a different binary. ASCII is a set of character sets based on the Roman alphabet, which uses a 1-byte low 7-bit representation of the character, and the high position is always 0. The LATIN1 character set is extended with respect to the ASCII character set and still uses a byte to represent the character, but a high level is enabled, extending the representation of the character set. The GB2312, GBK, GB18030 character sets are character sets that support Chinese characters, and the character set range gb2312<gbk< GB18030. The characters of the GBK character set are one-byte encoded and two-byte encoded. For 00-7f characters that are consistent with ASCII, Chinese characters are represented by 2 bytes. The first byte range is 81-fe, avoiding conflicts with 00-7f. The Unicode character set is an industry standard in the field of computer science, supporting the characters of all countries. Unicode character sets are encoded in several ways, such as common utf-8,utf-16,utf-32. The Utf8 uses 1-4 bytes to represent characters, Utf-16 takes a fixed 2 bytes, and utf-32 is stored in 4 bytes.

MySQL and character sets

The character set and encoding will exist wherever the text is involved. For the MySQL database system, the user typed an SQL statement from the MySQL client side, passed through TCP/IP to the MySQL server process, to the final file on the server side, each involving character storage. Where character storage is concerned, the character set encoding is involved, and the system variables provided by MySQL are visible. The MySQL character set sets the system variable and the meaning of the following table:

Variable name

Meaning

Character_set_server

Default internal-action character set

Character_set_client

Character set used by client source data

Character_set_connection

Connection Layer Character Set

Character_set_results

Query result Character Set

Character_set_database

The default character set for the currently selected database

Character_set_system

System metadata (field name, etc.) character set  

MySQL character encoding conversion process

If the settings of each of these system variables are inconsistent, for example, Character_set_client is UTF8, and Character_set_database is GBK, there will be a case for encoding conversion. So what is the principle of character set conversion? Suppose the string "Xiao Ming" of the GBK character set, need to change to UTF8 character set storage, is actually for "Xiao Ming" in the string of each character to UTF8 encoding table inside query corresponding binary, and then stored, only this, the code conversion does not involve complex algorithm. The MySQL character set conversion mainly involves several steps:

1) Convert data from character_set_client settings to character_set_connection settings;

2) Set the Character_set_connection setting to the character set of the table field;

3) Convert the operation result from the table field character set to the Character_set_results setting.

Below I use a common scenario to describe the process of character set conversion. The user via the MySQL command line (if remote connection: SECURECRT), the typing command "insert into T values (1, ' xiaoming ')", the string ' Xiao Ming ' in the process of transferring binary storage content.

A) The client uses the UTF8 character set, CHARACTER_SET_CLIENT=GBK,CHARACTER_SET_CONNECTION=GBK, and table T with the GBK character set.

Because the character_set_client, character_set_connection, and table character sets are all GBK, encoding conversions are not involved. Therefore, although the table for the character set is GBK, but "Xiao Ming" encoding is not for GBK encoded binary stream, but UTF8 binary stream, two Chinese characters occupy 6 bytes, while reading is a reverse process, does not involve encoding conversion, query can still correctly return "Xiao Ming."

b) In the case of a), change the character_set_client setting to UTF8 and query the inserted value.

You can see that the returned value is "Hao 忔 mei", this is because the character set of the table is GBK, and the client request is UTF8, then the server will e5b08fe6988e the binary stream corresponding GBK kanji "Hao 忔 mei" To UTF8 Chinese characters corresponding to the binary stream e7818fe5bf94e6a791, so the query results in SECURECRT display as "Hao 忔 Mei", which is usually what we call garbled.

c) in the case of B), set the character set of the SecureCRT to GBK, and see how the SECURECRT character set affects the results

You can see that another set of characters is returned "鐏 Chairman Wilson Home", the whole process is the same as B), just in the first step a byte-stream conversion, set the SECURECRT character set encoding, just changed the display mode.

Character set-related SQL statements

1) View character set encoding settings

 like '%CHARACTER%'

2) setting Character set encoding

SET NAMES xxx;

This statement is equivalent to setting the client's character set, consisting mainly of 3 system variables, character_set_client,character_set_connection and character_set_results.

3) Modifying the database character set

ALTER DATABASE  DATABASENAME  CHARACTERSET XXX;

This statement modifies only the library's character set, affects the default definition of subsequent tables created, and is not affected for the created table's character set.

4) Modify the character set of the table

ALTER TABLE CHARACTER SET XXX;

This statement modifies only the character set of the table, affects the default definition of the new column for subsequent tables, and the character set for the existing column is unaffected.

ALTER TABLE CONVERT  to CHARACTER SET XXX;

This statement modifies both the table character set and the existing column character set, and converts the existing data into character set encoding.

5) Modifying the column character set

ALTER TABLE COLUMN ' column_name '  CHARACTERSET xxx

6) binary encoding of query characters

SELECT HEX (col_name from table_name; SELECT LENGTH (col_name from table_name;

For the GBK table, if a character is found to occupy 3 bytes, than 1 of this case, it is certainly the character set in a certain aspect of the unity, figure 1 is because the client is UTF8, and mysqlclient and database are GBK caused.

MySQL Default character set latin1

the MySQL 4.x version was preceded by the latin1 character set (also known as iso-8859-1), latin1 character set encoding using single-byte encoding. Throw a question to the table of the latin1 character set, is there a problem for the user to write and read Chinese characters? The answer is no problem, as long as it is properly set. Assuming that SECURECRT is set to Latin1 for both the utf8,character_set_client and the table character sets, referring to the analysis of section 3rd, the problem of character set encoding conversion is not involved in the process of reading and writing data to the user. The UTF8 character into a binary stream is written to the database, extracted, securecrt then the corresponding binary decoding to the corresponding Chinese characters, so it does not affect the user's use. However, if the character_set_client,character_set_connection, and the table character set is not uniform, there may be garbled.

MySQL Character set knowledge summary

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.