The expression characters in PHP filter text and the MySQL server support for emoji

Source: Internet
Author: User
Tags base64 mysql in mysql version mysql command line

1. Filter the reason for emoji expression

In our project development, emoji expression is a troublesome thing, even if we can store it, it doesn't have to be perfect, because it's updated fast: On platforms other than iOS, such as PC or Android. If you need to display emoji, you have to prepare a bunch of emoji images and use a third-party front-end class library. Even so, it may be because the emoji picture is not enough to appear in the situation can not be displayed
In most business scenarios, emoji is not a must-have. We can consider killing it properly and saving all kinds of costs.

2.php Filter Emoji principle

  Emoji (maximum speed text, meaning from Japanese えもじ,e-moji,moji in Japanese is a character) is a set of 12x12 pixel emoticons originating in Japan, created by Chestnut Tanaka (Shigetaka kurit), which was first popular among Japanese networks and mobile phone users. Since emoji was added to Apple's iOS 5 input method, the emoji began to sweep across the globe, and the emoji has been adopted by most modern computer systems compatible with Unicode encoding, and is commonly used in a variety of mobile SMS and social networks. Recently, there are many netizens use emoji pattern to play guessing word game, enjoy this expression culture brings fun.

About emoji pronunciation: a lot of people at the first sight of emoji will subconsciously read it as "a grinding Ji", in fact, emoji transliteration came to probably read as "Eh Grind Ji", among them "E" pronunciation rather like the letter abc of a pronunciation.

Originally, Japan's three major telecom operators each had different character definitions, namely DoCoMo, KDDI and SoftBank. With iOS built-in version SoftBank, emoji is popular worldwide (before the iOS5 version). And Google itself defines a set of emoji characters. After iOS5, Apple adopted the Unicode-defined emoji character (after the iOS5 version).

The Unicode definition of emoji is four characters, SoftBank is 3 characters, emoji four characters from storage to show the corresponding system has not been considered, it is simply a disaster.

3. Emoji expression filtering for Unicode definitions

①.Unicode-defined emoji is four characters, filtered according to this principle

  

Filter out Emoji expression function Filter_emoji ($str) {    $str = Preg_replace_callback (    //Perform a regular expression search and replace '/./with a callback            U ',            function (array $match) {                return strlen ($match [0]) >= 4? ": $match [0];            },            $str);     return $str; }

  

  ②. Unicode emoji is 4 bytes, SoftBank defined emoji occupies 3 bytes of storage, through emoji for PHP, we can convert the Unicode emoji way to SoftBank mode, so that the database is not modified, Can be stored emoji, relative to the database level of the problem-solving approach, the action is much smaller, and there will be no performance, operation and other aspects of the problem. However, there is an unavoidable problem is that the SoftBank way is no longer maintained, so the new increase in emoji expression, SoftBank, will cause some loss of emoji expression situation, for this situation is not recommended to use.

Some of the following methods have not been practiced in person, but are available to everyone.

  

1. Using the UTF8MB4 character set

If you have a MySQL version >=5.5.3 , you can try directly to utf8 upgrade directly to the utf8mb4 character set
This 4-byte UTF8 encoding is perfectly compatible with the old 3-byte UTF8 character set and can store emoji emoticons directly, and is one of the better solutions.
As for the performance loss caused by the increase in bytes, according to your own project, you estimate it ....

2. Using Base64 encoding

If you can't use the UTF8MB4 character set for some reason, you can also use the base64 curve to salvation
Using functions such as base64_encode the emoji can be stored directly in the UTF8 byte set of the data table, when taken out decode a bit

The minimum MySQL version of 1.UTF8MB4 supports version 5.5.3+, if not, upgrade to a newer version.

MySQL version view command see: Four ways to view the MySQL version; MySQL installation steps see: How to upgrade MySQL to the latest version of MySQL in Linux
2. Modify the database, table, and column character sets. refer to the following statement:
ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
ALTER TABLE table_name CONVERT to CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE table_name Change column_name VARCHAR (191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
3. Modify the MySQL configuration file my.cnf (Windows is My.ini)

MY.CNF generally in the etc/mysql/my.cnf position. When you find it, add the following three sections:

[Client]
Default-character-set = Utf8mb4

[MySQL]
Default-character-set = Utf8mb4

[Mysqld]
Character-set-client-handshake = FALSE
Character-set-server = Utf8mb4
Collation-server = Utf8mb4_unicode_ci
init_connect= ' SET NAMES utf8mb4 '

4. Restart MySQL Server, check the character set

1.) Restart Command reference:/etc/init.d/mysql restart

2.) Enter command: MySQL, go to MySQL command line (if prompt does not have permission, can try to enter mysql-uroot-p your password)

3.) in the MySQL command line, enter: SHOW VARIABLES where variable_name like ' character_set_% ' OR variable_name like ' collation% ';

Check whether the following:

+--------------------------+--------------------+
| variable_name | Value |
+--------------------------+--------------------+
| character_set_client | UTF8MB4 |
| character_set_connection | UTF8MB4 |
| Character_set_database | UTF8MB4 |
| Character_set_filesystem | binary |
| Character_set_results | UTF8MB4 |
| Character_set_server | UTF8MB4 |
| Character_set_system | UTF8 |
| collation_connection | Utf8mb4_unicode_ci |
| Collation_database | Utf8mb4_unicode_ci |
| Collation_server | Utf8mb4_unicode_ci |
+--------------------------+--------------------+
Rows in Set (0.00 sec)

Special instructions under: Collation_connection/collation_database/collation_server If it is utf8mb4_general_ci, no relationship. But we must ensure that character_set_client/character_set_connection/character_set_database/character_set_results/character_set_ Server is UTF8MB4. For information on what these character set configurations are for, see: Drill down to MySQL character set settings

5. Modify the connection data character set UFT8MB4

' db ' = [
' Class ' = ' yii\db\connection ',
' DSN ' = ' mysql:host=192.168.1.130;dbname= ',
' Username ' = ',
' Password ' = ',
' CharSet ' = ' utf8mb4 ',
],

The expression characters in PHP filter text and the MySQL server support for emoji

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.