An error occurs when you use the print function on the console to print the Chinese documents crawled by crawlers.
Unicodeencodeerror: 'gbk' codec can't encode character '\ u0001f602' in position...
The article uses UTF-8 encoding, but the error message shows that GBK cannot encode the character '\ u0001f602'
It indicates that the utf8 article is converted to GBK during the printing process, while GBK obviously cannot encode certain UNICODE characters.
There are two solutions:
① Modify the encoding method of the standard output stream:
Import Io
Import sys
SYS. stdout = Io. textiowrapper (SYS. stdout. buffer, encoding = 'gb18030 ')
Or
SYS. stdout = Io. textiowrapper (SYS. stdout. buffer, encoding = 'utf8 ')
② There is also a permanent method-modifying console encoding:
Command Line input chcp
Output: activity code page: 936
Indicates that the current encoding is the default GBK
Modify the encoding:
Enter chcp 65001 in the command line.
Convert to utf8
Font: lucida Console
Then, you can print the crawled Chinese articles.
---------------------
Original article: 81219186
Console output problems