If the crawled content is compressed,
#!/usr/bin/env python#-*-coding:utf-8-*- fromStringioImportStringioImportUrllib2Importgzip#Some websites will return gzip-compressed data, such as www.qq.com, regardless of whether client support does not support gzip decompression.Headers= {"user-agent":"mozilla/5.0 (Windows NT 10.0; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/59.0.3071.86 safari/537.36"}request= Urllib2. Request ("http://www.qq.com/", headers =headers) Response=Urllib2.urlopen (Request) HTML=""#Judgment:#If the response message is content-encoding as gzip, indicating that the response content is compressed by gzip, the data is uncompressedifResponse.info (). Get ('content-encoding') =='gzip': #get compressed byte stream data into memory via Stringiodata =Stringio (Response.read ())#by Gzip. Gzipfile to extract the data and return the extracted file objectf = gzip. Gzipfile (fileobj =data)#Save the extracted stringHTML =F.read ()#Otherwise read the response data directlyElse: HTML=Response.read ()#writing data to a disk fileWith open ("qq.html","W") as F:f.write (HTML)
Compressed data processing in HTML