MySQL---Character set explanation

Source: Internet
Author: User

Common character sets include ASCII, GB2312, GBK, UTF-8, Unicode

The first thing to know


ASCII encoding:

A byte is used to identify 0-9 digits, uppercase and lowercase letters, and some punctuation and invisible characters. 1 bytes 8 bits, can have 256 kinds of combinations. The standard ASCII encoding takes advantage of only one byte of the last 7 bits (128 combinations), and the highest bit is used for parity.

Range is 0000 0000-0111 1111 or 0-127


Because there are only 256 combinations of ASCII, Chinese characters are thousands, so more bytes are needed to represent a Chinese character, and common Chinese codes have GB2312 and GBK.


GB2312 Code:

is the local encoding of Chinese computers, using two bytes to represent a Chinese character. To be compatible with ASCII encoding, the two-byte range is not at 0-127, but between 128-255. The GB2312 theoretically have a maximum of 128*128=16384 combinations. Enough to represent commonly used Chinese characters.

There are several numbers, each of which represents a byte, which is GB2312 encoded and which is ASCII encoded? The numbers are just a random lift.

128 200 65 189 178 23 213 186

128 200 because two bytes are between 128-255, it is a Chinese character encoded with GB2312.

651 bytes in the 0-127, so it is ASCII encoded, the actual expression is the lowercase letter ' a '.

189 178 The same is the Chinese character encoded with GB2312

213 186 is also a Chinese character encoded with GB2312


GBK Code:

It is an upgraded version of GB2312, hosting more Chinese characters. It also uses two bytes to represent Chinese characters, and the first byte has a theoretical range of 128-255, but the second byte range becomes 0-255. It is theoretically possible to represent 128*256 combinations. Also compatible with ASCII encoding.

Like what

128 65 65 189 178 23 213 186

Since 128 is not 0-127, the 2 bytes starting with 128 represent a Chinese character

65 in 0-127, so the lowercase letter ' a ' is represented by a byte.

189 178 The same 189 is not 0-127, it means a Chinese character.

213 186 The same 213 is not 0-127, it means a Chinese character.


Unicode character Set:

Unicode is actually just the equivalent of a table, numbering 4 bytes of text to the world.


UTF-8 Code:

The relationship between UTF-8 and Unicode is equivalent to compressing the file with the source file, and UTF-8 is used to compress Unicode. UTF-8 is a variable-length encoding that theoretically uses 1 to 6 bytes to represent a character.


The highest bit is 0, with 1 bytes representing one character

The highest bit has n consecutive 1, then one character is represented by n bytes.


MySQL garbled problem:

Garbled problem there are two possible: 1, decoding and corresponding encoding does not match. 2, data corruption.

The first one can be resolved, and the second data is corrupted and cannot be restored.

For the first case, you should first understand the various aspects of MySQL


The client encodes the data according to the client's encoding and then transmits it to the connector, if it does not match the encoding of the connector and then converts it to the connector's encoding. The encoding of the connector is then encoded with the stored data. If the connector is encoded in a way that is inconsistent with how the database stores the data, it is then converted to the encoding of the database storage data. Similarly, when the result is returned, the database data is compared to the connector encoding, inconsistent then to the connector code, and then the connector encoding and result set encoding, inconsistency is converted into results and coding.

All in all, as long as the client, connector, result set encoding is consistent, most of the situation is not garbled.

can be done by

Set character_set_client = encoding mode; Setting client-side encoding

Set character_set_connection= encoding method; setting connector encoding

Set character_set_results= encoding method; setting up the result set encoding


These three sentences can also be simplified to one sentence:Set names encoding method


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.