Copy codeThe Code is as follows:
"""
If only common
Import urllib. request
Html = urllib. request. urlopen ("http://www.sina.com"). read ()
Print (html. decode ('gbk '))
The following error occurs:
Builtins. UnicodeDecodeError: 'gbk' codec can't decode byte 0x8b in position 1: illegal multibyte sequence
What should I do? In the past, some websites used gzip to compress webpages.
See the following code.
We recommend that you use python2
Import urllib2
From StringIO import StringIO
Import gzip
Request = urllib2.Request ('HTTP: // www.sina.com ')
Request. add_header ('Accept-encoding ', 'gzip ')
Response = urllib2.urlopen (request)
If response.info (). get ('content-encoding') = 'gzip ':
Buf = StringIO (response. read ())
F = gzip. GzipFile (fileobj = buf)
Data = f. read ()
Print data. decode ("GBK"). encode ('utf-8 ')
"""
Import io
Import urllib. request as r
Import gzip
Req = r. request ("http://www.sina.com", headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) chrome/31.0.1650.63 Safari/537.36 "," Accept-Encoding ":" gzip "})
Bs = r. urlopen (req). read ()
Bi = io. BytesIO (bs)
Gf = gzip. GzipFile (fileobj = bi, mode = "rb ")
Print (gf. read (). decode ("gbk "))