First to say why I have this problem, because after learning reptiles want to take watercress film Top250 practice practicing, this is a reference link (http://blog.csdn.net/fighting_no1/article/details/50926008), The blog's code almost all of the content of watercress film, so I first save the code to run the local, the result is that the Excel file is not, toss a day to know is the resolution of the page   this special character in the transfer code can not be correctly encoded,
Unicodeencodeerror: ' GBK ' codec can ' t encode character U ' \xa0 ' in position
Believe that similar problems you will encounter, or is garbled, or directly this type of error, in fact, the final word is the Python character encoding and decoding of the problem, such as the same Python file I run in the window under the error of the newspaper, Ubuntu is okay, are the reason for this encoding decoding.
First of all, the solution to the specific reasons can refer to this blog, the reasons for reference to explain the link, the introduction of very detailed, I do not repeat the wheel.
If you are running under window, why:
For Unicode characters, you need to print out, because the local system is cmd in Windows, the default codepage is CP936, the GBK encoding, so the Python interpreter needs to encode the above Unicode character as GBK first, And then display it in CMD. However, because the Unicode string contains some characters that cannot be displayed in the GBK, the error of "GBK ' codec can ' t encode" is prompted at this point.
Workaround 1:
Some forward actions are required when converting
String.Replace (U ' \xa0 ', U ')
Replace ' \xa0 ' with ' u ' space, this \xa0 is the &nbps space in that HTML
Workaround 2:
When encoding Unicode characters, add the ignore parameter, ignoring characters that cannot be encoded, so that the code can be encoded as GBK.
Print your string. Encode ("GBK", ' ignore ');
The main reference links
The process of his exploration
Links to other minor references