Recently made a demand, the public number of users to synchronize the user information to the server, found that many users of the nickname there are emoticons (emoji), the general process is to change the code of MySQL to UTF8MB4, and later discussed the next, these expressions are no use, storage when the direct deletion of the good.
Filtration method
How does python filter emoji emoticons? Here is a code snippet that rejects the expression string python2.7 under test
ImportReemoji_pattern = Re.compile (u "(\ud83d[\ude00-\ude4f]) |" # Emoticons u "(\ud83c[\udf00-\uffff]) |" # Symbols & pictographs (1 of 2) u "(\UD83D[\U0000-\UDDFF]) |" # Symbols & pictographs (2 of 2) u "(\ud83d[\ude80-\udeff]) |" # Transport & Map Symbols u "(\UD83C[\UDDE0-\UDDFF])" # flags (IOS) "+", Flags=re. UNICODE) def Remove_emoji(text): returnEmoji_pattern.sub (R ', text)
Refer to Removing-emojis-from-a-string-in-python, if the regular is not written to can also encounter sre_constants.error: bad character range
such errors.
Here, the emoticons are removed based on the Unicode range, and the generic and iOS should not be very full, nor do they find a very full list. After the confirmation or there is a write filter, it is best to change the field to UTF8MB4. If there is a more complete filtration method, please share
Modify character encoding
ALTER TABLE `table_name` MODIFY `nickname` VARCHAR(40) CHARSET utf8mb4 COLLATE utf8mb4_unicode_ci;
- How to delete emoji emoji in JS
Delete (filter) Emoji emoji characters in Python string