The problem of character encoding is indeed a perpetual puzzle. Before I also dealt with the problem of R language Chinese garbled, it just drives me crazy!!!!!!!!!!!!!!!!! HOLY shit!!!!!!!!!!!!
Or that's the wrong thing to do.
Unicodeencodeerror: ' GBK ' codec can ' t encode character U ' \u200e ' in position 43:illegal multibyte sequence
Either it can be read and then garbled.
My own environment is UTF-8.
Import Sys; Print (Sys.getdefaultencoding ()); UTF -8
In the first line plus
# -*-coding:utf-8-*-
Change your py file to UTF-8 file format and then
↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓ ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓
The Perfect solution:
Bytes.decode ("GBK",'ignore'). Encode ("GBK "). Decode ('UTF-8','ignore')
#总之尼玛你就GBK和UTF-8 Get up and get up! 艹!
Python Crawl Web page garbled problem