Create a Python "script" for automatically downloading movies! That kind of movie can also be Oh!

Source: Internet
Author: User
Tags xpath

It's all very regular, isn't it? In Scrapy, we can set the rules we want, and scrapy can deal with these regular URLs and page information. Let's take a look at the results, and in this section we want the result of the movie name in the output leaderboard:

And we can actually do better.

Pycharm Use this folder to create project,douban/db to create a file that is convenient for the IDE to run run.py

Input

From Scrapy.cmdline Import Execute

Execute ([' scrapy ', ' crawl ', ' Douban ')

Open douban/db/items.py, such as the name of the file, this is our "warehouse" from the watercress to take out the "goods", we want What "goods"? Movie name.

Classdbitem (scrapy. Item):

Name = Scrapy. Field ()

Douban/db/db/spiders Create spider.py, this file to crawl Web pages, processing URLs, we need to tell him we want to go to the "warehouse" How to go, how to crawl.

Access to "goods" information

It's Parse_item's turn to work, and he needs to find the "goods" we want from the designated location, where the "goods" can be found in XPath. Right-click on the page to view the source code, is wrapped up, the function of XPath is to use these tags to find specific information.

Browser F12 Open the debug console,

Refers to the information, the right side will show the corresponding code

Right click on a good thing,

This can be used. Unfortunately, scrapy is often not recognized, so we also need a bit of XPath syntax (check 10 minutes to fix), refer to copy's own modification. This plugin, which is recommended by Chrome's XPath helper, can see that the XPath you are writing is not correct.

def parse_item (self,response):

Name=response.xpath ('//*[@class = ' title '][1] ')

Print (name)

OK, test it, run run.py the following issues, 403 forbidden, the site suspected that we are robot operation,

So in setting.py disguise we are the normal operation of the browser,

Add a

Summarize

Scrapy can be used to crawl specific rules of the URL, and processing, allow, follow and other parameters tell the program how to "Shinuna", XPath can easily find the information in the Web page, the example cited in this article is only to extract the name of the movie, we actually like the article at the beginning of the picture could do a richer, Add some judgment to the score, actor, director, etc. and choose the movie we need.

Enter the group: 125240963 to get the source code Oh!

Create a Python "script" for automatic download of movies! That kind of movie can also be Oh!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.