Title, in fact, my question is very simple, is to write the crawler when the information contained in the page contains a string similar to "\u65b0\u6d6a\u5fae\u535a\u6ce8\u518c", in fact, this is the Unicode Chinese code, the corresponding Chinese is "Sina Weibo registration." In fact, I just want to find a function to let this string of things display Chinese, did not expect Baidu to find the right day. In this case, do not use what "Python code" "Unicode Encoding" "Unicode decoding" Such keywords to search, a large number of pages come out irrelevant.
In fact, a function of this problem is done, as follows:
Example 1:
>>> s = r "\u65b0\u6d6a\u5fae\u535a\u6ce8\u518c"
>>> s
' \\u65b0\\u6d6a\\u5fae\\u535a\\u6ce8\\u518c '
>>> Print S
\u65b0\u6d6a\u5fae\u535a\u6ce8\u518c
>>> s = S.decode ("Unicode_escape"); #就是这个函数
>>> Print S
Sina Weibo registration
Example 2:>>> str_ = "russopho\xe9bic, clichd and just pl\xe9ain stupid." >>> print str_russopho?bic, clichd and just pl?ain stupid.>>> str_ = Str_.decode ("Unicode_escape") > >> print str_russophoébic, clichd and just pléain stupid. (This method solves the "Bson.errors.InvalidStringData:strings in documents must is valid UTF-8" issue I encountered when inserting data into MongoDB) Attach the relevant blog link to this question: http://www.cnblogs.com/yangze/archive/2010/11/16/1878469.html There is also a problem with the Unicode byte string, This error message is encountered: Unicode equal comparison failed to convert both arguments to unicode-interpreting them as being unequal. It shows that when we compare two characters (strings), there are different types on both sides of the equals sign, possibly a Unicode byte string on one side and a string on one side. See Http://stackoverflow.com/questions/3400171/python-utf-8-comparison.
Summary:
In the future encounter wonderful problems to find the key words to search, otherwise it is likely to get nothing.
Python Unicode byte string turned into Chinese problem