Scrapy-css選取器

來源:互聯網
上載者:User

和xpath選取器比起來,感覺CSS選取器容易一些,跟寫.css時方法基本一樣,就是在擷取內容時和xpath不同,這裡需要注意一下.

這裡介紹如何用css選取器提取出一篇文章的資料
提取的資料跟xpath那篇文章內容相同
之前xpath中我們擷取元素是通過.entry-header h1::text,如果是屬性則用.entry-header a::attr(href)
介紹一個常用的函數extract_first()
相當於extract()[0],但是extract()[0]當數組沒有元素時,也就是沒有擷取到資料時會出錯,所以用extract_first()方法,也可以加上需要返回的內容,比如空,extract_first("")

title = response.css(".entry-header h1::text").extract_first()#p可以不加create_date = response.css("p.entry-meta-hide-on-mobile::text").extract()[0].strip().replace('·','').strip()#擷取點贊數praise_nums = response.css('#110287votetotal::text').extract()[0]#擷取收藏數fav_nums = response.css('.btn-bluet-bigger.href-style.bookmark-btn .register-user-only::text ').extract()[0].strip()match_re = re.match('.*?(\d+).*',fav_nums)if match_re:    #擷取收藏數    fav_nums = match_re.group(1)comment_nums = response.css('.btn-bluet-bigger.href-style.hide-on-480::text').extract()[0].strip()match_re = re.match('.*?(\d+).*',fav_nums)if match_re:    comment_nums = match_re.group(1)tag_list = response.css('.entry-meta-hide-on-mobile a::text').extract()content = response.css('div.entry').extract()[0]tag_list = [element for element in tag_list if not element.strip().endswith('評論')]tag = ','.join(tag_list)

當我們要選擇的屬性名稱字有多個時比如下面:


這市在選擇時應該用

post_urls = response.css('#archive .post.floated-thumb .post-thumb a::attr(href)').extract()

也就是.post.floated-thumb應該連起來,或者唯寫.floated-thumb 完整代碼(准)

def parse_detail(self, response):    title = response.css(".entry-header h1::text").extract_first()    create_date = response.css("p.entry-meta-hide-on-mobile::text").extract()[0].strip().replace("·","").strip()    praise_nums = response.css(".vote-post-up h10::text").extract()[0]    fav_nums = response.css(".bookmark-btn::text").extract()[0]    match_re = re.match(".*?(\d+).*", fav_nums)    if match_re:        fav_nums = int(match_re.group(1))    else:        fav_nums = 0    comment_nums = response.css("a[href='#article-comment'] span::text").extract()[0]    match_re = re.match(".*?(\d+).*", comment_nums)    if match_re:        comment_nums = int(match_re.group(1))    else:        comment_nums = 0    content = response.css("div.entry").extract()[0]    tag_list = response.css("p.entry-meta-hide-on-mobile a::text").extract()    tag_list = [element for element in tag_list if not element.strip().endswith("評論")]    tags = ",".join(tag_list)    pass
相關文章

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.