Recently playing some reptiles, Python Chinese transcoding just contact is a troublesome thing (the principle of conversion omitted)
Generally there are encode (' GBK '), Edcode (' UTF8 '), decode (' GBK '), decode (' UTF8 ') can solve a very large problem
But today I met the ' \\u6210\\u529f ' format, and the kick on it.
Through the almighty degree Niang, found the first post, breathtaking
Http://bbs.chinaunix.net/thread-3674073-1-1.html
The code is as follows
#!/usr/bin/env python#-*-coding:utf-8-*-import reimport sys def main (): For line in SYS.STDIN:SYS.STDOUT.WR ITE (re.sub (R ' \\u\w{4} ', Lambda e:unichr (int (e.group (0) [2:], +)). Encode (' Utf-8 '), line)) if __name_ _ = = ' __main__ ': Main ()
Then it's all about the mother, yes or he, find a more convenient and quick way
http://blog.csdn.net/garinwang/article/details/6329262
The code is as follows
str = str.decode (' unicode_escape ') str = str.encode (' GBK ')
The first step decodes the string to Unicode, and the second step encodes the Unicode code into the GBK kanji code.
This method can encode all the characters in the entire string.
Finish the call!
Python Chinese transcoding