I climbed Zhaopin today. About the request for Python, this code has nothing to say, and the same as the last, add things will be said later. All recruitment information on the page can be captured by recruiting information inside
fromBs4ImportBeautifulSoupImportRequests url2 = ' http://sou.zhaopin.com/jobs/searchresult.ashx?kw=python&sm=0&p=1 ' #print (res.text) headers = {' user-agent ': ' mozilla/5.0 (Windows NT 10.0; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/56.0.2924.87 safari/537.36 ', ' Host ': ' sou.zhaopin.com ',} web_data = Requests.get (url2, headers = headers) # print (web_data.text) soup = BeautifulSoup (web_data.text, ' Html.parser ') forI inSoup.select (' A '): # if i[' href '][:24] = = ' http://jobs.zhaopin.com/': # Print (i[' href '])Try:ifi[' href '].startswith (' http://jobs.zhaopin.com/'): info = requests.get (i[' href ']) Infosoup = Beautifu Lsoup (Info.text, ' Html.parser ') forA inInfosoup.select ('. Tab-inner-cont '):Try: Print (A.text)exceptKeyerror: PassexceptKeyerror: Pass
knock This code out of the process of encountering problems.
headers = {
' user-agent ':
' mozilla/5.0 (Windows NT 10.0; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/56.0.2924.87 safari/537.36 ',
' Host ': ' sou.zhaopin.com ',
}
When crawling home page, you can get the source code, but to the specific recruitment pages inside, found always return the wrong code. It turns out to be an anti-reptile, to add an agent. is the headers above.
try:
i[' href '].startswith (' http://jobs.zhaopin.com/'):
info = requests.get (i[' href '])
Infosoup = BeautifulSoup (Info.text, ' Html.parser ')
infosoup.select ('. Tab-inner-cont '): Try: Print (
a.text)
keyerror:
pass
keyerror:
Pass
This is a mistake that has been reported as a keyerror. Then accept the answer, need to add an exception, is above the try:......except Keyerror:pass on the line.
The above code, you can see because it is class so crawled down, not all of the Python application requirements look below
HTML
Requests
# page = Requests.get (' http://econpy.pythonanywhere.com/ex/001.html ')
# tree = Html.fromstring (page.content)
# #This'll create a list of buyers:
# buyers = Tree.xpath ('//div[@title = " Buyer-name "]/text ()")
# #This would create a list of prices
# prices = Tree.xpath ('//span[@class = ' Item-price ']/ Text () ')
#
print (' Buyers: ', buyers)
# print (' Prices: ', prices)
page = Requests.get ("http:// Jobs.zhaopin.com/450575810250022.htm?ssidkey=y&ss=201&ff=03&sg=d382e8f6a66b4c9e800b41c98de68d55 &so=1&uid=689899307 ") Tree
= html.fromstring (page.content)
content = Tree.xpath ('//div[@class =" Tab-inner-cont "]/p/text ()")
print (content)
This code has been tested. You can only crawl the Pyhton recruiting section. Because it's the XPath method. But copy XPath. Always get a space. The comment is the code on the Web. I just changed it.
div[@class = "Tab-inner-cont"]/p/text ()
It turns out that the right thing to do is to mend. It won't be discussed.