Emoji emoji characters are now widely supported in the app. But MySQL's UTF8 encoding support for emoji characters is not that good. So we often encounter such an exception:
Incorrect string value: ' \xf0\x90\x8d\x83 ... ' for column
The reason is that UTF8 encoding in MySQL can only support up to 3 bytes, while emoji emoji characters use UTF8 encoding, many of them are 4 bytes, some even 6 bytes.
There are two types of solutions:
1. Use UTF8MB4 's MySQL encoding to accommodate these characters.
2. Filter out these special emoji characters.
For the first solution, please refer to: http://segmentfault.com/a/1190000000616820 and http://info.michael-simons.eu/2013/01/21/ java-mysql-and-multi-byte-utf-8-support/
There are a lot of details to note, such as: MySQL version, MySQL configuration, MySQL connector version and so on.
Because we used the cloud database, I chose to filter these special characters. In fact, the method of filtering is very simple, directly using regular expressions to match the encoding range, and then replace the line.
Here's my Code.
For more information, refer to: Http://stackoverflow.com/questions/27820971/why-a-surrogate-java-regexp-finds-hypen-minus
import Org.apache.commons.lang3.stringutils;public class Emojifilterutils {/** * will em Oji expression Replaced by * * * @param source * @return Filtered String */public static string Filteremoji (string source) { if (Stringutils.isnotblank (source)) {return Source.replaceall ("[\\ud800\\udc00-\\udbff\\udfff\\ud800-\\udfff ]", "*"); }else{return source; }} public static void Main (string[] arg) {try{String text = "The is a smiley \ud83c\udfa6 face \ud860\udd5d \ud860\ude07 \ud860\udee2 \ud863\udcca \ud863\udccd \ud863\udcd2 \ud867\udd98 "; System.out.println (text); System.out.println (Text.length ()); System.out.println (Text.replaceall ("[\\ud83c\\udc00-\\ud83c\\udfff]|[ \\ud83d\\udc00-\\ud83d\\udfff]| [\\U2600-\\U27FF] "," * ")); System.out.println (Filteremoji (text)); }catch (Exception ex) {ex.printstacktrace (); } }}
"Exception handling" incorrect string value: ' \xf0\x90\x8d\x83 ... ' for column ... Java implementation of emoji expression character filtering