Compared to the XPath selector, it's easier to feel a CSS selector, which is basically the same as writing. CSS, which is different from XPath when getting content.
Here's how to extract data from an article with a CSS selector
The extracted data is the same as the XPath article
Before XPath we get the element through. Entry-header H1::text, and if it is a property, use. Entry-header a::attr (HREF)
Introduces a commonly used function Extract_first ()
Equivalent to extract () [0], but extract () [0] When the array has no elements, that is, there is no error in getting the data, so the Extract_first () method can also be added to the content that needs to be returned, such as empty, Extract_first ("")
title = Response.css (". Entry-header h1::text"). Extract_first ()
#p可以不加
create_date = Response.css (" P.entry-meta-hide-on-mobile::text "). Extract () [0].strip (). replace (' • ', '). Strip ()
#获取点赞数
praise_nums = Response.css (' #110287votetotal:: Text '). Extract () [0]
#获取收藏数
fav_nums = Response.css ('. Btn-bluet-bigger.href-style.bookmark-btn. Register-user-only::text '). Extract () [0].strip ()
match_re = Re.match ('.*? (\d+). * ', fav_nums)
if match_re:
#获取收藏数
fav_nums = match_re.group (1)
comment_nums = Response.css ('. Btn-bluet-bigger.href-style.hide-on-480::text '). Extract () [0].strip ()
match_re = Re.match ('. *? \d+). * ', fav_nums)
if match_re:
comment_nums = Match_re.group (1)
tag_list = Response.css ('. Entry-meta-hide-on-mobile A::text '). Extract ()
content = response.css (' Div.entry '). Extract () [0]
tag_list = [element for element in Tag_list if not Element.strip (). EndsWith (' comment ')]
tag = ', '. Join (Tag_list)
When we want to select a property name there are multiple times, such as the following:
This city should be used when choosing
Post_urls = Response.css (' #archive. Post.floated-thumb. Post-thumb a::attr (HREF) '). Extract ()
That is, Post.floated-thumb should be connected, or write only. Floated-thumb complete code (quasi)
def parse_detail (self, response): title = Response.css (". Entry-header h1::text"). Extract_first () Create_date = Res Ponse.css ("P.entry-meta-hide-on-mobile::text"). Extract () [0].strip (). replace ("•", ""). Strip () Praise_nums = Response.css (". vote-post-up h10::text"). Extract () [0] fav_nums = Response.css (". Bookmark-btn::text"). Extract () [0] M Atch_re = Re.match (". *?" ( \d+). * ", fav_nums) if match_re:fav_nums = Int (Match_re.group (1)) Else:fav_nums = 0 Comment_ Nums = Response.css ("a[href= ' #article-comment '] span::text"). Extract () [0] match_re = Re.match (". *?
\d+). * ", comment_nums) if match_re:comment_nums = Int (Match_re.group (1)) Else:comment_nums = 0 Content = Response.css ("Div.entry"). Extract () [0] tag_list = response.css ("P.entry-meta-hide-on-mobile a::text"). E Xtract () Tag_list = [element for element in Tag_list if not Element.strip (). EndsWith ("comments")] tags = ",". Join (Tag_li ST) Pass