1. Introduction
Scrapy frame structure is clear, based on the twisted of the asynchronous architecture can make full use of computer resources, is the necessary basis for the crawler, this article will introduce the installation of Scrapy.
2, Installation lxml
2.1:https://www.lfd.uci.edu/~gohlke/pythonlibs/#twisted Select the lxml library corresponding to the python3.5
2.2 If the PIP version is too low, first upgrade the PIP:
python-m Pip install-u pip
2.3 Install the lxml library (copy the downloaded library file to the Python installation directory, hold down the SHIFT key and right-click to select "Open command Window Here")
Pip Install LXML-4.1.1-CP35-CP35M-WIN_AMD64.WHL
See the appearance of successfully and other words description by chapter success.
3. Install Twisted Library
3.1 Download Link: https://www.lfd.uci.edu/~gohlke/pythonlibs/#twisted select the corresponding python3.5 library file
3.2 Installation
Pip Install TWISTED-17.9.0-CP35-CP35M-WIN_AMD64.WHL
See the appearance of successfully and other words description by chapter success.
4, Installation Scrapy
After the installation of the Twisted Library is successful, the installation of Scrapy is simple, and the command is entered directly in the Command Prompt window:
Pip Install Scrapy
See the appearance of successfully and other words description by chapter success.
5. Scrapy Test
5.1 New Project
Create a new Scrapy crawler project, select Python's working directory (my: H:\PycharmProjects then install the SHIFT key and right-click "Open Command Window Here") and enter the command:
Scrapy Startproject Allister
The corresponding directory will generate a directory Allister folder with the following directory structure:
└──allister├──allister│├──__init__.py│├──items.py│├──pipelines.py│├──settings.py│└──spiders└──scrapy.cfg a Brief introduction The role of the file: #-----------------------------------------------Scrapy.cfg: The project's configuration file; allister/: The project's Python module, The code will be referenced from here allister/items.py: Project's Items file allister/pipelines.py: Project's Pipelines file allister/settings.py: Project's setup file allister/ Spiders: Directory for crawler storage
5.2 Modify the allister/items.py file:
#-*-Coding:utf-8-*-# Define Here the models for your scraped items## see documentation in:# https://doc.scrapy.org/en/ Latest/topics/items.htmlimport scrapyclass Allisteritem (scrapy. Item): name = Scrapy. Field () Level = Scrapy. Field () info = scrapy. Field ()
5.3 Writing a file allisterspider.py
#!/usr/bin/env python#-*-coding:utf-8-*-# @File: itcastspider.py# @Author: allister.liu# @Date: 2018/1/18# @Desc : Import scrapyfrom allister.items import Allisteritemclass itcastspider (scrapy. Spider): name = "IC2C" allowed_domains = ["http://www.itcast.cn"] start_urls = ["Http://www.itcast.cn/cha Nnel/teacher.shtml#ac "] def parse (self, response): items = [] for site in Response.xpath ('//div[@clas S= "Li_txt"]: item = Allisteritem () T_name = Site.xpath (' H3/text () ') T_level = Site.xpat H (' H4/text () ') T_desc = Site.xpath (' P/text () ') Unicode_teacher_name = T_name.extract_first (). Strip () Unicode_teacher_level = T_level.extract_first (). Strip () Unicode_teacher_info = T_desc.extract_first (). Strip () item["name"] = Unicode_teacher_name item["level"] = Unicode_teacher_level Item ["info"] = Unicode_teacher_info items.append (item) return items
After writing is complete, copy to the project's \allister\spiders directory, and cmd Select the project root directory to enter the following command:
Scrapy Crawl Ic2c-o ic2c_infos.json-t JSON
The crawled data is stored in the Ic2c_infos.json file in JSON format;
If the following error appears, please see the corresponding workaround:
Python3.5 Installation & Test Scrapy