What is the difference between UTF8 and utf8mb4 in MySQL?

Source: Internet
Author: User
What does this article bring to you about the difference between UTF8 and utf8mb4 in MySQL? , there is a certain reference value, the need for friends can refer to, I hope you have some help.

First, Introduction

MySQL added this utf8mb4 code after 5.5.3, Mb4 is the most bytes 4 meaning, specifically designed to be compatible with four-byte Unicode. Fortunately, UTF8MB4 is a superset of UTF8, except that there is no need to convert the encoding to UTF8MB4. Of course, in order to save space, the general use of UTF8 is enough.

Ii. Description of Content

It says that since UTF8 can save most Chinese characters, why use UTF8MB4? The original MySQL supported UTF8 encoding maximum character length is 3 bytes, if you encounter a 4-byte wide character will insert an exception. The three-byte UTF-8 maximum encoded Unicode character is 0xFFFF, which is the basic Multilingual Plane (BMP) in Unicode. That is, any Unicode character that is not in the base multi-text plane cannot be stored using Mysql's UTF8 character set. Includes Emoji emoticons (Emoji is a special Unicode encoding, common on iOS and Android phones), and a lot of infrequently used Chinese characters, as well as any new Unicode characters and so on.

Iii. root causes of the problem

The original UTF-8 format uses one to six bytes and can encode 31 characters maximum. The latest UTF-8 specification uses only one to four bytes and can encode up to 21 bits, just to represent all 17 Unicode planes.

UTF8 is a character set in Mysql that supports only a maximum of three bytes of UTF-8 characters, which is the basic multi-text plane in Unicode.

Why does UTF8 in Mysql only support a maximum of three bytes of UTF-8 characters?
I thought about it, probably because Mysql just started to develop that meeting, Unicode has no auxiliary plane this said. At the time, the Unicode commission was dreaming of "65,535 characters enough for the world". The length of the string in Mysql is counted as the number of characters, not bytes, and for CHAR data types, you need to keep the string long enough. When using the UTF8 character set, the length that needs to be retained is the length of the utf8 longest character multiplied by the length of the string, so it's natural to limit the length of the UTF8 to 3, for example, a CHAR of 300 bytes. As for the subsequent version, why not support the 4-byte length of the UTF-8 character, I think one is for backwards compatibility considerations, there is a basic multilingual plane outside the character is really rarely used.

To save 4-byte-long UTF-8 characters in Mysql, you need to use the UTF8MB4 character set, but only 5.5. After 3 versions are supported (View version: Select version ();). I think that in order to get better compatibility, you should always use UTF8MB4 instead of UTF8. For char type data, UTF8MB4 consumes more space and, according to Mysql's official recommendation, uses VARCHAR instead of char.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.