Python crawler crawling Zhaopin recruitment requirements

Source: Internet
Author: User
Tags xpath
I climbed Zhaopin today. About the request for Python, this code has nothing to say, and the same as the last, add things will be said later. All recruitment information on the page can be captured by recruiting information inside

 fromBs4ImportBeautifulSoupImportRequests url2 = ' http://sou.zhaopin.com/jobs/searchresult.ashx?kw=python&sm=0&p=1 ' #print (res.text) headers = {' user-agent ': ' mozilla/5.0 (Windows NT 10.0;  WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/56.0.2924.87 safari/537.36 ', ' Host ': ' sou.zhaopin.com ',} web_data = Requests.get (url2, headers = headers) # print (web_data.text) soup = BeautifulSoup (web_data.text, ' Html.parser ') forI inSoup.select (' A '): # if i[' href '][:24] = = ' http://jobs.zhaopin.com/': # Print (i[' href '])Try:ifi[' href '].startswith (' http://jobs.zhaopin.com/'): info = requests.get (i[' href ']) Infosoup = Beautifu Lsoup (Info.text, ' Html.parser ') forA inInfosoup.select ('. Tab-inner-cont '):Try: Print (A.text)exceptKeyerror: PassexceptKeyerror: Pass

knock This code out of the process of encountering problems.

headers = {
' user-agent ':
' mozilla/5.0 (Windows NT 10.0; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/56.0.2924.87 safari/537.36 ',
 ' Host ': ' sou.zhaopin.com ',
}
When crawling home page, you can get the source code, but to the specific recruitment pages inside, found always return the wrong code. It turns out to be an anti-reptile, to add an agent. is the headers above.

try:
    i[' href '].startswith (' http://jobs.zhaopin.com/'):
        info = requests.get (i[' href '])
        Infosoup = BeautifulSoup (Info.text, ' Html.parser ')
         infosoup.select ('. Tab-inner-cont '): Try: Print ( 
            
                a.text)
             keyerror:
                pass
keyerror:
     Pass

This is a mistake that has been reported as a keyerror. Then accept the answer, need to add an exception, is above the try:......except Keyerror:pass on the line.


The above code, you can see because it is class so crawled down, not all of the Python application requirements look below

HTML
Requests

# page = Requests.get (' http://econpy.pythonanywhere.com/ex/001.html ')
# tree = Html.fromstring (page.content)
# #This'll create a list of buyers:
# buyers = Tree.xpath ('//div[@title = " Buyer-name "]/text ()")
# #This would create a list of prices
# prices = Tree.xpath ('//span[@class = ' Item-price ']/ Text () ')
#
print (' Buyers: ', buyers)
# print (' Prices: ', prices)

page = Requests.get ("http:// Jobs.zhaopin.com/450575810250022.htm?ssidkey=y&ss=201&ff=03&sg=d382e8f6a66b4c9e800b41c98de68d55 &so=1&uid=689899307 ") Tree
= html.fromstring (page.content)
content = Tree.xpath ('//div[@class =" Tab-inner-cont "]/p/text ()")
print (content)

This code has been tested. You can only crawl the Pyhton recruiting section. Because it's the XPath method. But copy XPath. Always get a space. The comment is the code on the Web. I just changed it.

div[@class = "Tab-inner-cont"]/p/text ()

It turns out that the right thing to do is to mend. It won't be discussed.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.