What is the difference between utf8 and utf8mb4 in mysql?

Source: Internet
Author: User
Tags character set mysql database


Utf8mb4 is currently the largest character encoding and supports any text.

Why is there UTF8MB4?

Since utf8 has no problem in daily use, why utf8mb4? MySQL of earlier versions supports UTF-8 encoding. The maximum character length is 3 bytes. If a 4-byte character is entered, an error occurs. The maximum encoded Unicode character of the three-byte UTF-8 is 0 xFFFF, which is the basic multi-text plane (BMP) in Unicode ). That is to say, Unicode characters that are not in the basic multi-text plane cannot be stored using the original MySQL utf8 character set. What are the characters not in BMP? The most common is Emoji (Emoji is a special Unicode code that is commonly used on ios and android phones), and some uncommon Chinese characters, as well as any newly added Unicode characters.

UTF-8 coding

Theoretically, the UTF-8 format uses one to six bytes and can encode up to 31 characters. The latest UTF-8 specification uses only one to four bytes and can encode up to 21 bits, representing exactly all 17 Unicode planes. For more information about UTF encoding, see Common coding summary.
Utf8 is a character set supported by Mysql earlier versions. It only supports up to three bytes of UTF-8 characters, that is, the basic multi-text plane in Unicode. This may be because at the early stage of MySQL release, there were very few characters out of the basic multi-text plane. In MySQL5.5.3, to save 4-byte length UTF-8 characters in Mysql, you can use utf8mb4 character set. For example, you can use utf8mb4 character encoding to directly store emoj expressions, rather than replace emoticon characters.
For better compatibility, utf8mb4 should always be used instead of utf8. In fact, the default character set of the latest phpmyadmin version is utf8mb4. It is true that utf8mb4 storage consumes more space for CHAR data.

So what is utf8mb4 more than utf8?

Supports emoji encoding.

For actual use, you can set utf8mb4 for the library or table that uses emoji.

For example, comments and articles can be used to support emoji.

We recommend that you use utf8 for normal tables. If the table needs to support emoji, use utf8mb4.

There is also a sorting rule when creating a mysql database or table.

Utf8_unicode_ci is more accurate, and utf8_general_ci is faster. Generally, the accuracy of utf8_general_ci is enough for us. After reading many program source codes, I find that most of them also use utf8_general_ci, therefore, you can use utf8_general_ci to create a data library.
For utf8mb4, utf8mb4_general_ci utf8mb4_unicode_ci

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.