http://my.oschina.net/jiemachina/blog/189460
Note that the replacement of these emoji is the standard expression characters, each expression is originally 2 bytes, replaced by a string, each expression becomes 12 characters, wasted a lot of space, but simple, do not need to write a map one by one corresponding;
Turn the expression into a string
def Filter_emoji (desstr,restr= "):
‘‘‘
Filter Emoticons
‘‘‘
Try
CO = re.compile (U ' [\u00010000-\u0010ffff] ')
Except Re.error:
CO = re.compile (U ' [\ud800-\udbff][\udc00-\udfff] ')
Return Co.sub (RESTR, DESSTR)
Turn the string into an expression
def Str_2_emoji (EMOJI_STR):
‘‘‘
Convert a string to an emoticon
‘‘‘
If not EMOJI_STR:
Return EMOJI_STR
h = Htmlparser.htmlparser ()
Emoji_str = H.unescape (H.unescape (EMOJI_STR))
#匹配u "\u0001f61c" and U "\u274c" the expression string
CO = re.compile (ur "u[\" \ "]\\[uu" ([\w\]]{9}|[ \w\ "]{5})")
Pos_list=[]
Result=emoji_str
#先找位置
For M in Co.finditer (EMOJI_STR):
Pos_list.append ((M.start (), M.end ()))
#根据位置拼接替换
For POS in range (len (pos_list)):
If pos==0:
Result=emoji_str[0:pos_list[0][0]]
Else
RESULT=RESULT+EMOJI_STR[POS_LIST[POS-1][1]:p Os_list[pos][0]]
result = Result +eval (emoji_str[pos_list[pos][0]:p os_list[pos][1])
If Pos==len (pos_list)-1:
Result=result+emoji_str[pos_list[pos][1]:len (EMOJI_STR)]
return result
Python emoji expression filtering