Comprehensive understanding of the differences between utf8 and utf8mb4 in mysql, utf8utf8mb4

Source: Internet
Author: User

Comprehensive understanding of the differences between utf8 and utf8mb4 in mysql, utf8utf8mb4

I. Introduction

MySQL added the utf8mb4 encoding after 5.5.3. mb4 is the meaning of most bytes 4, which is specially used to be compatible with four-byte unicode. Fortunately, utf8mb4 is the superset of utf8. You do not need to convert the encoding to utf8mb4. Of course, utf8 is enough to save space.

Ii. Content Description

Now that utf8 can store most Chinese characters, why utf8mb4? Mysql supports up to three UTF-8 encoded characters. If a 4-byte string is used, an exception is inserted. The maximum encoded Unicode Character of the Three-byte UTF-8 is 0 xffff, which is the basic multilingual plane (BMP) in Unicode ). That is to say, Unicode characters that are not in the basic multi-text plane cannot be stored using the utf8 Character Set of Mysql. Including Emoji expressions (Emoji is a special Unicode encoding that is common on ios and android phones), and many uncommon Chinese characters, as well as any newly added Unicode characters.

Iii. Root Cause

The original UTF-8 format uses one to six bytes that can encode up to 31 characters. The latest UTF-8 specification uses only one to four bytes and can encode up to 21 bits, representing exactly all 17 Unicode planes.

Utf8 is a character set in Mysql that only supports up to three bytes of UTF-8 characters, that is, the basic multi-text plane in Unicode.

Why does utf8 in Mysql only support UTF-8 characters with a maximum length of three bytes? I thought for a moment, maybe it was because Mysql was just started to develop and Unicode had no secondary plane. At that time, the Unicode committee had a dream of "65535 characters enough for the whole world. The string length in Mysql is counted as the number of characters rather than the number of bytes. For the CHAR data type, it must be long enough for the string. When utf8 character sets are used, the length to be retained is the maximum length of utf8 characters multiplied by the string length. Therefore, the maximum length of utf8 is limited to 3, for example, CHAR (100) mysql retains the length of 300 bytes. As to why the subsequent versions do not support 4-byte UTF-8 characters, I think one is for backward compatibility considerations, and there is also the basic multi-text flat that are rarely used.

To save 4-byte UTF-8 characters in Mysql, you need to use the utf8mb4 character set, which is supported only after version 5.5.3 (view version: select version ();). In my opinion, utf8mb4 should always be used for better compatibility, instead of utf8. utf8mb4 will consume more space for CHAR-type data. According to Mysql official recommendations, use VARCHAR to replace CHAR.

The difference between utf8 and utf8mb4 in mysql is the full content shared by xiaobian. I hope to give you a reference, and I hope you can provide more support for the customer's house.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.