Create a scrapy Project

Source: Internet
Author: User

This is to capture the top250 information of a Douban.

First open pycharm

Enter scrapy startproject Douban in terminal at the lower end of pycharm

In this case, the system generates the following file (the spiders file contains a _ init _. py) and a _ init _. py items. py middlewares. py piplines. py settings. py

From the first article, we know that the scrapy framework has only three things. We need to operate on one of them: items, settings, and another spider file created under the same conditions.

First open items. py

Items. py is the place where we define the data structure. Now we define what will be stored in the future.

What we need is the serial number, movie name, Movie Introduction, star rating, movie comment, movie description

You can create your desired content in the format of # name = scrapy. Field () by default.

Then we change settings. py.

More settings. py content first find robotstxt_obey = true

Because what we want to do is violate this rule, so the first thing is to change true to false.

The second thing is to change download_delay = 3 to download_delay = 0.5.

In this way, we can achieve faster speed.

The most important thing is user_agent.

We go to our target Website: https://movie.douban.com/top250

Press F12 to open the debugging tool and press F5 to refresh the page. Find the required top250 text to view the html

 

Click top250 and pull down to find user_agent.

Copy the content to our settings. py so that the current setting. py is complete.

You can create a crawler file as follows:

Scrapy genspider crawler Name Domain Name

A crawler file is generated.

 

Create a scrapy Project

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.