Interest cramps, climbed the next 51job, but encountered coding problems! Here's a simple piece of code
Get the entire page data
#-*-Coding:utf-8-*-
Import requests
Import Sys
Reload (SYS)
Sys.setdefaultencoding (' Utf-8 ')
def spider (URL): = requests. Session () = Session.get (URL, headers=headers) return'http ://www.51job.com/'= spider (URL)
Results:
Print html.encoding>>>iso-8859-1
A paragraph in the Html.text
PrintHtml.text>>>langs: {ts_qxjzw:'??????? °??', Queren:'è è?', Guanbi:'1?±?', YXDD:'Ò??? Μ?μ?', YXZN:'Ò???? °?ü', Yxhy:'Ò??? Ddòμ', NZDNXJ:'? úx?? À?ü????', Xiang:'??', XJDQ:'???? μ???', Xj_xg:'???? /dt??', Zycs:'? ÷òa3?êd', SYSF:'? ùódê? Y', TSPD:'Ì?êa?μμà', Qxjgzdd:'?????? 1¤x÷μ?μ?', QXJZNLB:'??????? °?üàà±e', qxjhylb:'?????? Ddòμàà±e', Gzdd:'1¤x÷μ?μ?', Buxian:'2?? T' } ,
I set Html.text.decode (' Iso-8859-1 '), Error! Unicodeencodeerror: ' ASCII ' codec can ' t encode characters in position 249-254:ordinal not in range (128)
Read some of the information, and finally added html.encoding = ' GBK ', done!
Code:
#-*-coding:utf-8-*-ImportRequestsImportsysreload (SYS) sys.setdefaultencoding ('Utf-8')defspider (URL): Session=requests. Session () HTML= Session.get (URL, headers=headers) html.encoding='GBK' returnHtmlurl='http://www.51job.com/'HTML= Spider (URL)
Show a paragraph in HTML
PrintHtml.text>>>langs: {ts_qxjzw:'Please select a position', Queren:'Confirm', Guanbi:'Close', YXDD:'Selected Location', YXZN:'Selected Functions', Yxhy:'Selected Industries', NZDNXJ:'you can select up to', Xiang:'Items', XJDQ:'Select Region', Xj_xg:'Select/Modify', Zycs:'Major Cities', SYSF:'all Provinces', TSPD:'Special Channels', Qxjgzdd:'Please select a place of work', QXJZNLB:'Please select a functional category', qxjhylb:'Please select industry category', Gzdd:'Work Place', Buxian:'Not Limited' } ,
The coding problem of crawling 51job job information