Carmela Introduction
Carmela provides a 44-section UTF-8 solution based on php,php extensions, java,c++ and other languages, such as common emoji tag support
Background:
UTF-8 format contains emoji expression strings inserted directly into the database, if the database does not make adjustments will prompt an error, by changing the database and the table's character set to Utf8mb4_general_ci, you can avoid this problem. However, in many large systems and architectures, modifying the character set of a database can cause a lot of problems, such as PC-side display, new and old data compatibility issues. There is another solution to this type of problem, which is to replace before warehousing, and then reverse-replace the client type after the outbound.
Carmela
Carmela offers a 44-section UTF-8 solution based on PHP extensions that can replace UTF-8 characters greater than 3 bytes in UTF-8 with UBB mode, such as a UTF-8 character%f0%9f%91%a4 (for display convenience, Show the encode mode of the emoji tag, replace it with [u]1f464[/u], and reverse-replace it with different request clients (IOS,ANDRIOD,PC) when it is read from the database. Carmela's name source "not the same Carmela", "different Carmela" series of stories about the hen Carmela and her sons and daughters Camelido and Carmen's adventures, everyone in the Carmela family is so different, dare to fantasy, more dare to try others dare not think of things.
Installation
1. Compiling the package
git clone https://github.com/ugg/Carmela
/phpize./configure --with-php-config=
/ Php-config-pathmakemake Install
Modifying a configuration file
Vim/php.ini
Add the following content
[Carmela] Extension=carmela.so
Method:
Carmela_str2ubb: The string that contains the emoji tag is converted to UBB mode, and the replacement looks like [u]1f464[/u].
An example:
$str = UrlDecode ("This was test%f0%9f%98%9c+%f0%9f%98%99 by Ugg"); echo "str:". $str. " \ n "; echo" UBB: ". Carmela_str2ubb ($STR)." \ n ";
Output Result:
Str:this is test xxxx (CSDN emoji cannot be displayed with XXXX instead) by Uggubb:this are test [u]1f61c[/u] [u]1f619[/u] by UGG
CARMELA_UBB2STR: Contains UBB tags converted to utf-8 string format, for the transfer of PC platform, you can refer to the Carmela_ubb2str method in encode.class.php.
An example:
$str = UrlDecode ("This was test%f0%9f%98%9c+%f0%9f%98%99 by Ugg"); $str = Carmela_str2ubb ($STR); echo "UBB:" $str. " \ n "; echo" str: ". CARMELA_UBB2STR ($STR)." \ n ";
Output Result:
Ubb:this is test [u]1f61c[/u] [u]1f619[/u] by Uggstr:this is test (CSDN emoji cannot be displayed in XXXX instead) by UGG
CARMELA_SUBSTR:
Intercepts a string containing emoji characters to specify a length character.
Carmela_sububb:
Intercepts the specified length character of the string containing the UBB tag.
CARMELA_DELSTR:
Delete the emoji characters in the string, non-strict mode, the 3-byte emoji character cannot be deleted, mainly used in some.
Carmela_delubb:
Delete the UBB tag that contains the UBB tag string.
Performance
Using PHP to implement two methods, respectively, using PHP Str_replace method and PHP to find four-byte emoji, replace the method, and PHP extension method, using the same data to test each, the test results are as follows.
=========================== scheme 1:php str_replace mode ==================================== EMOJI to STRING ==========TIME : 781.94ms, processing line: 100, processing number of words: 10100, processing bytes: 31028 average per line processing time: 7.819ms=========== STRING to EMOJI ==========time:118.566ms, processing rows : 100, processing number of words: 18710, processing bytes: 37793 average per line processing time: 1.186ms=========================== scheme 2:php character lookup method =========================== ========= EMOJI to STRING ==========time:51.526ms, number of rows processed: 100, number of processed words: 10100, Bytes processed: 31028 average per line processing time: 0.515ms=========== STRING To EMOJI ==========time:27.959ms, processing rows: 100, processing number of words: 23092, processing bytes: 41236 average per row processing time: 0.28ms=========================== Scenario 3:php Extension Mode ==================================== EMOJI to STRING ==========time:0.721ms, number of rows processed: 100, processing words: 10100, Bytes processed: 31028 average per line processing time: 0.007ms=========== STRING to EMOJI ==========time:0.956ms, number of rows processed: 100, processed words: 20308, Bytes processed: 38452 average per line processing time: 0.01ms
From the above test results, the Str_replace way, the performance is very poor. Using PHP to write the replacement function directly, performance increased by more than 10 times times, and the expansion mode, the performance is significantly improved, when the emoji from the character form to UBB mode, performance increased 1000 times times.
The above test data can be generated dynamically through create_file.php. This test case generates 100 rows of data, 100 characters per line, 100 characters can contain 3-10 emoji characters, is tested, runs directly benchmark.php view run performance.
How does pc support emoji emoticons?
Locate the images directory under the emoji directory in the project directory, create the emoji folder from the Web root, copy the images folder to the emoji file, and invoke the Encode.class.php method in Carmela_ubb2str.
Util_encode::carmela_ubb2str ($str, "PC");
Can be displayed on the PC emoji expression, the current collection of 845 emoji expressions, some new emoticons are not included in it, of course, this method is not written in the PHP extension, the performance is relatively not high.
Contact Ugg.xchj@gmail.com for all questions