Utf8 in Mysql

Source: Internet
Author: User
Recently, I encountered a strange problem? When Mysql inserts a UTF-8 string, the string is truncated. Print the executed SQL statement and find that it is disconnected from a character that cannot be displayed. At first it was suspected that it was an illegal UTF-8 encoding, but it was found through hexdump that it was a legal UTF-8 character and the encoding length was 4 bytes. Then, resolve the problem

Recently, I encountered a strange problem? When Mysql inserts a UTF-8 string, the string is truncated. Print the executed SQL statement and find that it is disconnected from a character that cannot be displayed. At first it was suspected that it was an illegal UTF-8 encoding, but it was found through hexdump that it was a legal UTF-8 character and the encoding length was 4 bytes. Then, resolve the problem

Recently, I encountered a strange problem? When Mysql inserts a UTF-8 string, the string is truncated. Print the executed SQL statement and find that it is disconnected from a character that cannot be displayed. At first it was suspected that it was an illegal UTF-8 encoding, but it was found through hexdump that it was a legal UTF-8 character and the encoding length was 4 bytes. Then it is decoded as Unicode encoding. After a google click, it turns out to be an Emoji expression (Emoji is a special Unicode encoding that is common on ios and android phones ).

Why is the Emoji expression truncated? In my understanding, Mysql utf8 should be able to accept any valid UTF-8 string. As a reminder, Mysql earlier versions only support up to 3 bytes of UTF-8. After checking the document, it is true that:

  • utf8, A UTF-8 encoding of the Unicode character set using one to three bytes per character.

The latest version of Mysql is still the same as the old version of Mysql. The maximum encoded Unicode Character of the Three-byte UTF-8 is 0 xffff, which is the basic multilingual plane (BMP) in Unicode ). That is to say, Unicode characters that are not in the basic multi-text plane cannot be stored using the utf8 Character Set of Mysql. Including the above Emoji expressions, many uncommon Chinese characters, and any newly added Unicode characters.

This utf8 is not another UTF-8.

The UTF-8 is a transport encoding Format (8-bit Unicode Transformation Format) for Unicode ). The original UTF-8 format uses one to six bytes that can encode up to 31 characters. The latest UTF-8 specification uses only one to four bytes and can encode up to 21 bits, representing exactly all 17 Unicode planes.

Utf8 is a character set in Mysql that only supports up to three bytes of UTF-8 characters, that is, the basic multi-text plane in Unicode.

Why does utf8 in Mysql only support UTF-8 characters with a maximum length of three bytes? I thought for a moment, maybe it was because Mysql was just started to develop and Unicode had no secondary plane. At that time, the Unicode committee had a dream of "65535 characters enough for the whole world. The string length in Mysql is counted as the number of characters rather than the number of bytes. For the CHAR data type, it must be long enough for the string. When utf8 character sets are used, the length to be retained is the maximum length of utf8 characters multiplied by the length of the string. Therefore, the maximum length of utf8 is 3, for example, CHAR (100 )? Mysql retains the length of 300 bytes. As to why the subsequent versions do not support 4-byte UTF-8 characters, I think one is for backward compatibility considerations, and there is also the basic multi-text flat that are rarely used.

To save 4-byte UTF-8 characters in Mysql, you need to use the utf8mb4 character set, which is supported only after Mysql 5.5. I think utf8mb4 should always be used instead of utf8 .? For CHAR-type data, utf8mb4 consumes more space. According to Mysql official recommendations, use VARCHAR? Replace CHAR.

Original article address: utf8 in Mysql. Thank you for sharing it.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.