Python--scrapy Crawler Learning (1)--__python of crawler framework Generation

Source: Internet
Author: User

Demo Address: http://python123.io/ws/demo.html

File name: demo.html

To produce a crawler frame:

1, the establishment of a Scrapy reptile project

2, in the project to produce a scrapy crawler

3. Configure Spider Crawler

4, run the crawler, get the Web page

Specific actions:

1, the establishment of engineering

Define a project, the name is: Python123demo

Method:

In cmd, D: Enter D disk, CD Pycodes into file Pycodes

And then enter

Scrapy Startproject Python123demo

A file is generated in Pycodes:




_init_.py does not require user writing




2, in the project to produce a scrapy crawler

Execute a command, give the reptile name and crawl the website

To produce a reptile:


Generate a spider called demo

Build demo.py only, the contents of which are:


Name = ' Demo ' current crawler name for demo

Allowed_domains = ' Crawl the link below the site's domain name, which is entered by the CMD command table.

Start_urls = [] Crawl initial page


Parse () is used to process the corresponding, parse content to form a dictionary, discover new URL crawl request



3, configure the production of Spider Crawler to meet our needs

Save the parsed page as a file

modifying demo.py files



4, run the crawler, get the Web page

Open cmd Enter command line for crawler



Then there was a mistake on my computer.


The solution to this problem on the Windows system requires the installation of the Py32win module, but directly through the official website link installed EXE will appear hundreds of errors, more convenient way is


PIP3 Install Pypiwin32

This is Py3 's solution.

Note: Py3 version if install PYPIWIN32 instructions with PIP will cause an error

After the installation is complete, re-carry the crawler and succeed. Sprinkle flowers.


The capture page is stored in the demo.html file










The complete code for the demo.py:


Two versions are equivalent:




Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.