Several processing methods related to emoji expressions in PHP development have been developed in recent months. storing nicknames is essential. However, this type of support for emoji expressions is a bit of a headache.
Generally, Mysql tables are designed with the UTF8 character set. Insert the nickname field with emoji into the field and the entire field is changed to an empty string. What's going on?
It turns out that Mysql's utf8 character set is 3 bytes while emoji is 4 bytes, so that the whole nickname cannot be stored. What should I do? I will introduce several methods
1. use the utf8mb4 character set
If your mysql version is greater than or equal to 5.5.3, you can directly upgrade utf8 to the utf8mb4 character set.
This 4-byte utf8 encoding is perfectly compatible with the old 3-byte utf8 character set and can directly store emoji expressions, which is the best solution
As for the performance loss caused by byte increase, I have read some evaluations, which are almost negligible.
2. use base64 encoding
If you cannot use utf8mb4 for some reason, you can use base64 to save the country by curve.
Emoji encoded by functions such as base64_encode can be directly stored in the data table of the utf8 byte set. you can decode it when you retrieve it.
3. get rid of emoji
Emoji is a troublesome thing. even if you can store it, it cannot be perfectly displayed. On platforms other than iOS, such as PC or android. If you need to display emoji, you have to prepare a lot of emoji images and use a third-party front-end class library. Even so, it is possible that the emoji images are incomplete and cannot be displayed. in most business scenarios, emoji is not mandatory. We can properly consider killing it to save various costs.
After some hard work on google, I finally found the code that can be used reliably:
// Filter out the emoji function filterEmoji ($ str) {$ str = preg_replace_callback ('/. /U', function (array $ match) {return strlen ($ match [0])> = 4? '': $ Match [0] ;}, $ str); return $ str ;}
The basic idea is to traverse each character in a string. if the length of this character is 4 bytes, delete it.
Reprinted from: pein0119
For a small project I recently worked on, I used Method 3 to solve the problem easily.
This article is also published on my blog. if you like it, please come and play it.