It's all very regular, isn't it? In Scrapy, we can set the rules we want, and scrapy can deal with these regular URLs and page information. Let's take a look at the results, and in this section we want the result of the movie name in the output leaderboard:
And we can actually do better.
Pycharm Use this folder to create project,douban/db to create a file that is convenient for the IDE to run run.py
Input
From Scrapy.cmdline Import Execute
Execute ([' scrapy ', ' crawl ', ' Douban ')
Open douban/db/items.py, such as the name of the file, this is our "warehouse" from the watercress to take out the "goods", we want What "goods"? Movie name.
Classdbitem (scrapy. Item):
Name = Scrapy. Field ()
Douban/db/db/spiders Create spider.py, this file to crawl Web pages, processing URLs, we need to tell him we want to go to the "warehouse" How to go, how to crawl.
Access to "goods" information
It's Parse_item's turn to work, and he needs to find the "goods" we want from the designated location, where the "goods" can be found in XPath. Right-click on the page to view the source code, is wrapped up, the function of XPath is to use these tags to find specific information.
Browser F12 Open the debug console,
Refers to the information, the right side will show the corresponding code
Right click on a good thing,
This can be used. Unfortunately, scrapy is often not recognized, so we also need a bit of XPath syntax (check 10 minutes to fix), refer to copy's own modification. This plugin, which is recommended by Chrome's XPath helper, can see that the XPath you are writing is not correct.
def parse_item (self,response):
Name=response.xpath ('//*[@class = ' title '][1] ')
Print (name)
OK, test it, run run.py the following issues, 403 forbidden, the site suspected that we are robot operation,
So in setting.py disguise we are the normal operation of the browser,
Add a
Summarize
Scrapy can be used to crawl specific rules of the URL, and processing, allow, follow and other parameters tell the program how to "Shinuna", XPath can easily find the information in the Web page, the example cited in this article is only to extract the name of the movie, we actually like the article at the beginning of the picture could do a richer, Add some judgment to the score, actor, director, etc. and choose the movie we need.
Enter the group: 125240963 to get the source code Oh!
Create a Python "script" for automatic download of movies! That kind of movie can also be Oh!