基於scrapy架構下爬取智聯招聘--並把資訊儲存下來，scrapy智聯招聘

最後更新：2018-03-17 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

1.在之前爬取的JobSpider中的Terminal終端中，直接建立新的檔案

scrapy genspider zlzp baidu.com

2.開始解析資料
1) 先大致規劃一下需要幾個函數

2) 函數1跳轉到函數2使用 yield scrapy.Request(url,callback,meta,dont_filter)

# -*- coding: utf-8 -*-import scrapyfrom ..items import JobspiderItem# 智聯招聘資訊擷取class ZlzpSpider(scrapy.Spider):    name = 'zlzp'    allowed_domains = ['zhaopin.com']    start_urls = [        'http://sou.zhaopin.com/jobs/searchresult.ashx?jl=%E5%8C%97%E4%BA%AC%2B%E4%B8%8A%E6%B5%B7%2B%E5%B9%BF%E5%B7%9E%2B%E6%B7%B1%E5%9C%B3%2B%E6%AD%A6%E6%B1%89&kw=python&p=1&isadv=0',        'http://sou.zhaopin.com/jobs/searchresult.ashx?jl=%E5%8C%97%E4%BA%AC%2B%E4%B8%8A%E6%B5%B7%2B%E5%B9%BF%E5%B7%9E%2B%E6%B7%B1%E5%9C%B3%2B%E6%AD%A6%E6%B1%89&kw=php&p=1&isadv=0',        'http://sou.zhaopin.com/jobs/searchresult.ashx?jl=%E5%8C%97%E4%BA%AC%2B%E4%B8%8A%E6%B5%B7%2B%E5%B9%BF%E5%B7%9E%2B%E6%B7%B1%E5%9C%B3%2B%E6%AD%A6%E6%B1%89&kw=html&p=1&isadv=0'    ]    def parse(self, response):        yield scrapy.Request(            url=response.url,            callback=self.parse_job_info,            meta={},            dont_filter=True,        )    def parse_job_info(self, response):        """            解析工作資訊        :param response:        :return:        """        zl_table_list = response.xpath("//div[@id='newlist_list_content_table']/table[@class='newlist']")        for zl_table in zl_table_list[1:]:            # tbody 是網頁自動產生的 運行起來看效果/或者右鍵查看源碼            # zl_td_list = zl_table.xpath("tr[1]/td")            # 問題：td 數不是5個，會報錯--索引越界            # td1 = zl_table_list[0]            # td2 = zl_table_list[1]            # td3 = zl_table_list[2]            # td4 = zl_table_list[3]            # td5 = zl_table_list[4]            # 尋找元素盡量用xpath定位，少用索引，因為有可能出現索引越界錯誤            # 只有在不明確錯誤時使用異常捕獲            # //text()擷取標籤內所有文本            # extract()把列表裡的元素轉換成文本,本身還是列表            # extract_first('預設值')把列表裡的元素轉換成文本並取出第一個，如果取不到，返回預設值            td1 = zl_table.xpath("tr/td[@class='zwmc']/div/a//text()").extract()            # map返回的是一個列表 td1 = list(map(str.strip, td1))            td1 = map(str.strip, td1)            job_name = "".join(td1).replace(",", "/")            # strip()只能清除兩端的            fan_kui_lv = zl_table.xpath("tr/td[@class='fk_lv']/span/text()").extract_first('沒有反饋率').strip()            job_company_name = zl_table.xpath("tr/td[@class='gsmc']/a[1]/text()").extract_first('沒有公司名稱').strip()            job_salary = zl_table.xpath("tr/td[@class='zwyx']/text()").extract_first('面議').strip()            job_place = zl_table.xpath("tr/td[@class='gzdd']/text()").extract_first('沒有工作地點').strip()            print(job_name, fan_kui_lv, job_company_name, job_salary, job_place)            item = JobspiderItem()            item['job_name'] = job_name            item['job_company_name'] = job_company_name            item['job_place'] = job_place            item['job_salary'] = job_salary            item['job_time'] = "沒有時間"            item['job_type'] = "智聯招聘"            item['fan_kui_lv'] = fan_kui_lv            yield item        yield scrapy.Request(            url=response.url,            callback=self.parse_next_page,            meta={},            dont_filter=True,        )    def parse_next_page(self, response):        """            解析下一頁        :param response:        :return:        """        #  //div[@class='pagesDown']/ul/li/a[text()='下一頁']/@href        next_page = response.xpath(" //a[text()='下一頁']/@href").extract_first('沒有下一頁')        if next_page:            yield scrapy.Request(                url=next_page,                callback=self.parse_job_info,                meta={},                dont_filter=True,            )

3.其他的不用設定，直接利用JobSpider中存在的檔案

4.運行結果如下：

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More