Introduction to class libraries for processing Emoji based on PHP extension [Carmela] And emojicarmela
Carmela Introduction
Carmela provides a set of processing 4-4 UTF-8 solutions based on PHP, PHP extension, JAVA, C ++ and other languages, such as common Emoji label support
Background:
UTF-8 format contains Emoji emojis string directly inserted into the database, if the database is not adjusted will prompt an error, by changing the character set of the database and table utf8mb4_general_ci, you can avoid this problem. However, in many large systems and architectures, modifying the character set of a database may cause many problems, such as PC-side display and compatibility of New and Old data. To address this type of problem, there is another solution, which can be replaced before warehouse receiving and reverse replacement based on the client type after warehouse picking.
Carmela
Carmela provides a PHP-based extension of a 4-4 UTF-8 solution that can replace UTF-8 characters larger than 3 bytes with ubb mode, for example, a UTF-8 character % f0 % 9f % 91% a4 (for ease of display, display the encode mode of the emoji tag), after replacement [u] 1f464 [/u], at the same time, when reading data from the database, reverse replacement is performed based on different request clients (iOS, Andriod, PC. Carmela's name comes from "different Carmela", and the series of "different Carmela" tells stories about the adventure between hen Carmela and her children carmeli and Karman, everyone in the Carmela family is so distinctive that they dare to think about things that others do not dare to think about.
Install
1. Compile and Package
git clone https://github.com/ugg/Carmela<php-bin>/phpize./configure --with-php-config=<php-path>/php-config-pathmakemake install
Modify configuration file
Vim/php. ini
Add the following content
[carmela] extension=carmela.so
Method: carmela_str2ubb: the string containing the emoji tag is converted to the ubb mode, which is [u] 1f464 [/u] After replacement. Example:
$str = urldecode("This is test %F0%9F%98%9C+%F0%9F%98%99 by ugg");echo "str:".$str."\n";echo "ubb:".carmela_str2ubb($str)."\n";
Output result:
Str: This is test xxxx (CSDN Emoji cannot be replaced by XXXX) by uggubb: This is test [u] 1f61c [/u] [u] 1f619 [/u] by ugg
Carmela_ubb2str: converts an ubb tag to a UTF-8 string format. For details about how to transfer data from a PC platform, refer to the carmela_ubb2str method in encode. class. php. Example:
$str = urldecode("This is test %F0%9F%98%9C+%F0%9F%98%99 by ugg");$str = carmela_str2ubb($str);echo "ubb:".$str."\n";echo "str:".carmela_ubb2str($str)."\n";
Output result:
ubb:This is test [u]1f61c[/u] [u]1f619[/u] by uggstr:This is test
(CSDN Emoji cannot be replaced by XXXX for display)
By ugg
Carmela_substr:
Truncates a string containing the emoji characters.
Carmela_sububb:
Truncates a string containing the ubb tag.
Carmela_delstr:
Delete the emoji character in the string. It is not in strict mode and the 3-byte emoji character cannot be deleted. It is mainly used in some cases.
Carmela_delubb:
Delete the ubb tag that contains the ubb tag string.
Performance
PHP implements two methods, respectively using the str_replace method of PHP and the four-byte emoji search method of PHP, And the extension method of PHP, respectively, using the same data for testing, the test results are as follows.
============================ Solution 1: PHP str_replace mode ================================================== EMOJI to string ========== TIME: 781.94 ms, number of rows processed: 100, number of words processed: 10100, number of bytes processed: 31028 average processing time per line: 7.819 ms ============ string to emoji ========== TIME: 118.566 ms, number of rows processed: 100, number of words processed: 18710, number of bytes processed: 37793 average processing time per line: 1.186 ms ========================== solution 2: PHP character Search Method = emoji to string =========== TIME: 51.526 ms, number of rows processed: 100, number of words processed: 10100, number of bytes processed: 31028 average processing time per line: 0.515 ms ============ string to emoji ========== TIME: 27.959 ms, number of rows processed: 100, number of words processed: 23092, number of bytes processed: 41236 average processing time per line: 0.28 ms ========================== solution 3: PHP Extension Method ========================================================== EMOJI to string ========== TIME: 0.721 ms, number of rows processed: 100, number of words processed: 10100, number of bytes processed: 31028 average processing time per line: 0.007 ms ============ string to emoji ========== TIME: 0.956 ms, number of rows processed: 100, number of words processed: 20308, number of bytes processed: 38452 average processing time per line: 0.01 ms
From the above test results, the str_replace method has very poor performance. Using PHP to directly write replacement functions improves the performance by more than 10 times. Using the extension method, the performance is improved significantly. When converting emoji from character form to ubb mode, the performance is improved by 1000 times.
The above test data can be dynamically generated through create_file.php. In this test case, 100 lines of data are generated, each line contains 100 characters, and the 100 characters can contain 3-10 emoji characters. For testing, run benchmark. php directly to check the running performance.
Principle
The principle of processing four-byte emoji is very simple. You can find and replace the emoji character through character comparison. The difficulty is how to improve the performance and quickly find and replace it in the basic principle. The PHP extension method provides an idea for you. You can refer to this idea to implement java, C #, js, and other versions.
How does PC support EMoji display?
Find the images directory under the emoji directory in the project directory, create the emoji folder from the web root directory, copy the entire images folder to the emoji file, and call the carmela_ubb2str method in encode. class. php,
Util_Encode::carmela_ubb2str($str, "PC");
The Emoji expression can be displayed on the PC. Currently, the first emoji expressions are collected, and some new emojis are not included. Of course, this method is not currently written into the PHP extension, relatively low performance.
Contact ugg.xchj@gmail.com for all questions