Recently, I was looking for a job, so I crawled through all the python positions in the hook net to provide a direction for myself. Hook NET data is still relatively easy to crawl, get JSON data directly parse on the line, nonsense not to say, direct paste code:
1 ImportJSON2 ImportUrllib3 ImportUrllib24 fromOpenpyxlImportLoad_workbook5filename ='E:\excel\position_number_11_2.xlsx'6WS = Load_workbook (filename=filename)7Sheet =ws.create_sheet (0)8Sheet.title ='position'9Count = 1Ten One forPageinchXrange (100): AFrom_data = { - ' First':'false', - 'PN': page, the 'KD':'Python' - } - -Header = { + "user-agent":'mozilla/5.0 (Windows NT 10.0; WOW64; rv:49.0) gecko/20100101 firefox/49.0', - 'Referer':'Https://www.lagou.com/jobs/list_Python?px=default&city=%E5%85%A8%E5%9B%BD', + } ARequest_url ='Https://www.lagou.com/jobs/positionAjax.json?px=default&needAddtionalResult=false' atdata =Urllib.urlencode (from_data) - -Request = Urllib2. Request (Request_url, Headers=header, data=data) - Try: -html = urllib2.urlopen (Request). Read (). Decode ('Utf-8') - exceptException: in Print 'No job information' - Break to #Print HTML +Jsonobj =json.loads (HTML) - #Print Jsonobj theDict_obj = jsonobj['content']['Positionresult']['result'] * forIteminchDict_obj: $ ifItem:Panax NotoginsengSheet.cell (Row=count, column=1). Value = item['companysize'] -Sheet.cell (Row=count, column=2). Value = item['Workyear'] theSheet.cell (Row=count, column=3). Value = item['Education'] +Sheet.cell (Row=count, column=4). Value = item['Financestage'] ASheet.cell (Row=count, column=5). Value = item[' City'] theSheet.cell (Row=count, column=6). Value = item['Industryfield'] +Sheet.cell (Row=count, column=7). Value = item['Formatcreatetime'] -Sheet.cell (Row=count, column=8). Value = item['Positionname'] $Sheet.cell (Row=count, column=9). Value = item['Companyfullname'] $Sheet.cell (Row=count, column=10). Value = item['Salary'] -Count + = 1 -Ws.save ('E:\excel\position_number_11_2.xlsx')
The code is written in a hurry, it is not how to standardize. After two days to send the code of Weibo and watercress, hope that the big God in the garden more guidance ^_^
Pull Hook net crawl all Python job information