1. First Import Tool
From Scrapy.selector import Selector
2. Use of selectors
Example: Response.selector.xpath ('//span/text () '). Extract ()
(1) Select text content in the title tag
Response.selector.xpath ('//title/text () ')
Two more simple methods are available
Response.xpath ('//title/text () ')
Response.css (' Title::text ')
Example:
RESPONSE.CSS (' img '). XPath (' @src '). Extract ()
Response.xpath ('//div[@id = ' images ']/a/text () '). Extract_first ()
Response.xpath ('//div[@id = ' not-exists ']/text () '). Extract_first (default= ' Not-found ')
(2) using regular matching
Response.xpath ('//a[contains (@href, "image")]/text () '). Re (R ' name:\s* (. *) ')
Response.xpath ('//a[contains (@href, "image")]/text () '). Re_first (R ' name:\s* (. *) ')
(3) Working with relative XPaths
DIVs = Response.xpath ('//div ')
For P in Divs.xpath ('.//p '):
Print P.extract ()
For P in Divs.xpath (' P '):
Print P.extract ()
(4)
(5)
Official examples:
>>> links = Response.xpath ('//a[contains (@href, "image")]
>>> Links.extract ()
[u ' <a href= "image1.html" >name:my image 1 <br></a> ',
U ' <a href= "image2.html" >name:my image 2 <br></a> ",
U ' <a href= "image3.html" >name:my image 3 <br></a> ",
U ' <a href= "image4.html" >name:my image 4 <br></a> ",
U ' <a href= "image5.html" >name:my image 5 <br></a> "
>>> for index, link in Enumerate (links):
... args = (index, Link.xpath (' @href '). Extract (), Link.xpath (' img/@src '). Extract ())
... print ' Link number%d points to URL%s and image%s '% args
Link number 0 points to URL [u ' image1.html '] and Image [u ' image1_thumb.jpg ']
Link number 1 points to URL [u ' image2.html '] and Image [u ' image2_thumb.jpg ']
Link number 2 points to URL [u ' image3.html '] and Image [u ' image3_thumb.jpg ']
Link number 3 points to URL [u ' image4.html '] and Image [u ' image4_thumb.jpg ']
Link number 4 points to URL [u ' image5.html '] and Image [u ' image5_thumb.jpg ']
Python second week (tenth day) My Python growth is one months to get the Python data mining done! (-MONGODB)