Demo Address: http://python123.io/ws/demo.html
File name: demo.html
To produce a crawler frame:
1, the establishment of a Scrapy reptile project
2, in the project to produce a scrapy crawler
3. Configure Spider Crawler
4, run the crawler, get the Web page
Specific actions:
1, the establishment of engineering
Define a project, the name is: Python123demo
Method:
In cmd, D: Enter D disk, CD Pycodes into file Pycodes
And then enter
Scrapy Startproject Python123demo
A file is generated in Pycodes:
_init_.py does not require user writing
2, in the project to produce a scrapy crawler
Execute a command, give the reptile name and crawl the website
To produce a reptile:
Generate a spider called demo
Build demo.py only, the contents of which are:
Name = ' Demo ' current crawler name for demo
Allowed_domains = ' Crawl the link below the site's domain name, which is entered by the CMD command table.
Start_urls = [] Crawl initial page
Parse () is used to process the corresponding, parse content to form a dictionary, discover new URL crawl request
3, configure the production of Spider Crawler to meet our needs
Save the parsed page as a file
modifying demo.py files
4, run the crawler, get the Web page
Open cmd Enter command line for crawler
Then there was a mistake on my computer.
The solution to this problem on the Windows system requires the installation of the Py32win module, but directly through the official website link installed EXE will appear hundreds of errors, more convenient way is
PIP3 Install Pypiwin32
This is Py3 's solution.
Note: Py3 version if install PYPIWIN32 instructions with PIP will cause an error
After the installation is complete, re-carry the crawler and succeed. Sprinkle flowers.
The capture page is stored in the demo.html file
The complete code for the demo.py:
Two versions are equivalent: