1. Filter the reason for emoji expression
In our project development, emoji expression is a troublesome thing, even if we can store it, it doesn't have to be perfect, because it's updated fast: On platforms other than iOS, such as PC or Android. If you need to display emoji, you have to prepare a bunch of emoji images and use a third-party front-end class library. Even so, it may be because the emoji picture is not enough to appear in the situation can not be displayed
In most business scenarios, emoji is not a must-have. We can consider killing it properly and saving all kinds of costs.
2.php Filter Emoji principle
Emoji (maximum speed text, meaning from Japanese えもじ,e-moji,moji in Japanese is a character) is a set of 12x12 pixel emoticons originating in Japan, created by Chestnut Tanaka (Shigetaka kurit), which was first popular among Japanese networks and mobile phone users. Since emoji was added to Apple's iOS 5 input method, the emoji began to sweep across the globe, and the emoji has been adopted by most modern computer systems compatible with Unicode encoding, and is commonly used in a variety of mobile SMS and social networks. Recently, there are many netizens use emoji pattern to play guessing word game, enjoy this expression culture brings fun.
About emoji pronunciation: a lot of people at the first sight of emoji will subconsciously read it as "a grinding Ji", in fact, emoji transliteration came to probably read as "Eh Grind Ji", among them "E" pronunciation rather like the letter abc of a pronunciation.
Originally, Japan's three major telecom operators each had different character definitions, namely DoCoMo, KDDI and SoftBank. With iOS built-in version SoftBank, emoji is popular worldwide (before the iOS5 version). And Google itself defines a set of emoji characters. After iOS5, Apple adopted the Unicode-defined emoji character (after the iOS5 version).
The Unicode definition of emoji is four characters, SoftBank is 3 characters, emoji four characters from storage to show the corresponding system has not been considered, it is simply a disaster.
3. Emoji expression filtering for Unicode definitions
①.Unicode-defined emoji is four characters, filtered according to this principle
//filter out emoji expressionfunctionFilter_emoji ($str){ $str=Preg_replace_callback(//Perform a regular expression search and replace with a callback'/./u ',function(Array $match) { return strlen($match[0]) >= 4? ‘‘ :$match[0]; },$str); return $str; }
②. Unicode emoji is 4 bytes, SoftBank defined emoji occupies 3 bytes of storage, through emoji for PHP, we can convert the Unicode emoji way to SoftBank mode, so that the database is not modified, Can be stored emoji, relative to the database level of the problem-solving approach, the action is much smaller, and there will be no performance, operation and other aspects of the problem. However, there is an unavoidable problem is that the SoftBank way is no longer maintained, so the new increase in emoji expression, SoftBank, will cause some loss of emoji expression situation, for this situation is not recommended to use.
Some of the following methods have not been practiced in person, but are available to everyone.
1. Using the UTF8MB4 character set
If you have a MySQL version >=5.5.3
, you can try directly to utf8
upgrade directly to the utf8mb4
character set
This 4-byte UTF8 encoding is perfectly compatible with the old 3-byte UTF8 character set and can store emoji emoticons directly, and is one of the better solutions.
As for the performance loss caused by the increase in bytes, according to your own project, you estimate it ....
2. Using Base64 encoding
If you can't use the UTF8MB4 character set for some reason, you can also use the base64
curve to salvation
Using functions such as base64_encode
the emoji can be stored directly in the UTF8 byte set of the data table, when taken out decode a bit
Meoji Expression for PHP filter form input