Mysql-Character Set details

Source: Internet
Author: User

Common Character sets include ASCII, GB2312, GBK, UTF-8, Unicode

First, you must know


ASCII code:

One byte is used to identify 0-9 numbers, uppercase and lowercase letters, and some punctuation and invisible characters. 1 byte, 8 bits, can have 256 combinations. The standard ASCII encoding only uses the last 7 bits (128 combinations) of one byte, and the highest bits are used for parity.

The range is 0000 0000-0111 1111, that is, 0-127.


Because ASCII can contain up to 256 combinations of Chinese characters and thousands of Chinese characters, more bytes are needed to represent a Chinese character. Common Chinese characters are encoded in GB2312 and GBK.


GB2312 encoding:

It is the local encoding method of Chinese computers. It uses two bytes to represent a Chinese character. To be compatible with ASCII encoding, the value range of these two bytes is not between 0 and 127, but between 128-255. In theory, GB2312 can have a maximum of 128*128 = 16384 combinations. Adequate to indicate commonly used Chinese characters.

There are several numbers below, each representing one byte. Which are GB2312 encoding and which are ASCII encoding? Numbers are just for reference.

128 200 65 189 178 23 213 186

128 200 because both bytes are between 128-255, It is a Chinese character encoded with GB2312

65 is a byte between 0 and. Therefore, it is ASCII encoded and actually represents the lowercase letter 'A '.

189 178 Similarly, It is a Chinese character encoded in GB2312.

213 186 is also a Chinese character encoded with GB2312


GBK encoding:

It is an upgraded version of GB2312 and contains more Chinese characters. It also represents Chinese Characters in two bytes. The theoretical range of the first byte is still between 128-255, but the range of the second byte is 0-255. Theoretically, a maximum of 128*256 combinations can be expressed. It is also compatible with ASCII encoding.

For example

128 65 65 189 178 23 213 186

Because 128 is not between 0 and 128, the two bytes starting from indicate a Chinese character.

65 is between 0 and 127, so one byte is used to represent the lowercase letter 'A '.

189 178 Similarly, 189 is not between 0 and, indicating a Chinese character

213 186 Similarly, 213 is not between 0 and, indicating a Chinese character


Unicode Character Set:

Unicode is actually equivalent to a table. It uses four bytes to number all the texts in the world.


UTF-8 code:

The relationship between UTF-8 and Unicode is equivalent to the relationship between the compressed file and the source file, and the UTF-8 is used to compress Unicode. The UTF-8 is a variable-length encoding that theoretically represents a character in 1 to 6 bytes.


The maximum bit is 0, and 1 byte is used to represent a character.

If the maximum bit has n consecutive 1 characters, n Bytes are used to represent one character.


MySql garbled problem:

There are two possible garbled characters: 1. decoding does not match the corresponding encoding. 2. Data corruption.

The first one can be solved, and the second one cannot be restored if the data is damaged.

In the first case, you must first understand all aspects of MySql.


The client first encodes the data according to the client encoding method, and then transmits the data to the connector. If the encoding method does not match the connector encoding method, the data is converted to the connector encoding. Then compare the encoding of the connector with the encoding of the stored data. If the encoding method of the connector is inconsistent with that of the data stored in the database, the connector is converted to the encoding method of the data stored in the database. Similarly, when a result is returned, the database data is compared to the connector encoding. If they are inconsistent, the data is converted to the connector encoding, and then the connector encoding and result set encoding are compared. If they are inconsistent, the data is converted to the result and encoding.

All in all, as long as the client, connector, and result set encoding are consistent, there will be no garbled characters in most cases.

You can use

Set character_set_client = encoding method; set Client Encoding

Set character_set_connection = encoding method; set the connector Encoding

Set character_set_results = encoding method; sets the result set encoding.


These three sentences can also be simplified to one sentence:Set names encoding method


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.