Emoji expression characters are now widely supported in app. But MySQL's UTF8 code does not support emoji characters as well. So we are often confronted with exceptions like this:
Incorrect string value: ' \xf0\x90\x8d\x83 ... ' for column
The reason is that the UTF8 encoding in MySQL can only support 3 bytes, while the emoji emoticons use UTF8 encoding, many are 4 bytes, some even 6 bytes.
There are two types of solutions:
1. Use UTF8MB4 's MySQL code to accommodate these characters.
2. Filter out these special expression characters.
For the first solution, please refer to: http://segmentfault.com/a/1190000000616820 and http://info.michael-simons.eu/2013/01/21/ java-mysql-and-multi-byte-utf-8-support/
There are a lot of details to note, such as: MySQL version, MySQL configuration, MySQL connector version, and so on.
Because we use the cloud database, so I chose to filter these special characters. In fact, the filtering method is very simple, directly using regular expressions to match the coding range, and then replace on the line.
Here's my Code.
More can refer to: Http://stackoverflow.com/questions/27820971/why-a-surrogate-java-regexp-finds-hypen-minus
Import Org.apache.commons.lang3.StringUtils;
public class Emojifilterutils {/** * replaces emoji expression with * * * @param source * @return Filtered string public static string Filteremoji (string source) {if (Stringutils.isnotblank (source)) {return source
. ReplaceAll ("[\\ud800\\udc00-\\udbff\\udfff\\ud800-\\udfff]", "*");
}else{return source; } public static void Main (string[] arg) {try{String text = ' This is a smiley \ud83c\udfa6
face\ud860\udd5d \ud860\ude07 \ud860\udee2 \ud863\udcca \ud863\udccd \ud863\udcd2 ";
System.out.println (text);
System.out.println (Text.length ()); System.out.println (Text.replaceall ("[\\ud83c\\udc00-\\ud83c\\udfff]|[ \\ud83d\\udc00-\\ud83d\\udfff]|
[\\U2600-\\U27FF] "," * "));
System.out.println (Filteremoji (text));
}catch (Exception ex) {ex.printstacktrace ();
}
}
}