Python3.5 Installation & Test Scrapy

Last Update:2018-01-18 Source: Internet

Author: User

Tags xpath

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Introduction

　　Scrapy frame structure is clear, based on the twisted of the asynchronous architecture can make full use of computer resources, is the necessary basis for the crawler, this article will introduce the installation of Scrapy.

2, Installation lxml

2.1:https://www.lfd.uci.edu/~gohlke/pythonlibs/#twisted Select the lxml library corresponding to the python3.5

2.2 If the PIP version is too low, first upgrade the PIP:

python-m Pip install-u pip

2.3 Install the lxml library (copy the downloaded library file to the Python installation directory, hold down the SHIFT key and right-click to select "Open command Window Here")

Pip Install LXML-4.1.1-CP35-CP35M-WIN_AMD64.WHL

See the appearance of successfully and other words description by chapter success.

3. Install Twisted Library

3.1 Download Link: https://www.lfd.uci.edu/~gohlke/pythonlibs/#twisted select the corresponding python3.5 library file

3.2 Installation

Pip Install TWISTED-17.9.0-CP35-CP35M-WIN_AMD64.WHL

See the appearance of successfully and other words description by chapter success.

4, Installation Scrapy

After the installation of the Twisted Library is successful, the installation of Scrapy is simple, and the command is entered directly in the Command Prompt window:

Pip Install Scrapy

See the appearance of successfully and other words description by chapter success.

5. Scrapy Test

5.1 New Project

Create a new Scrapy crawler project, select Python's working directory (my: H:\PycharmProjects then install the SHIFT key and right-click "Open Command Window Here") and enter the command:

Scrapy Startproject Allister

The corresponding directory will generate a directory Allister folder with the following directory structure:

└──allister├──allister│├──__init__.py│├──items.py│├──pipelines.py│├──settings.py│└──spiders└──scrapy.cfg a Brief introduction The role of the file: #-----------------------------------------------Scrapy.cfg: The project's configuration file; allister/: The project's Python module, The code will be referenced from here allister/items.py: Project's Items file allister/pipelines.py: Project's Pipelines file allister/settings.py: Project's setup file allister/ Spiders: Directory for crawler storage

5.2 Modify the allister/items.py file:

#-*-Coding:utf-8-*-# Define Here the models for your scraped items## see documentation in:# https://doc.scrapy.org/en/ Latest/topics/items.htmlimport scrapyclass Allisteritem (scrapy. Item):    name = Scrapy. Field () Level    = Scrapy. Field ()    info = scrapy. Field ()

5.3 Writing a file allisterspider.py

#!/usr/bin/env python#-*-coding:utf-8-*-# @File: itcastspider.py# @Author: allister.liu# @Date: 2018/1/18# @Desc : Import scrapyfrom allister.items import Allisteritemclass itcastspider (scrapy. Spider): name = "IC2C" allowed_domains = ["http://www.itcast.cn"] start_urls = ["Http://www.itcast.cn/cha Nnel/teacher.shtml#ac "] def parse (self, response): items = [] for site in Response.xpath ('//div[@clas S= "Li_txt"]: item = Allisteritem () T_name = Site.xpath (' H3/text () ') T_level = Site.xpat             H (' H4/text () ') T_desc = Site.xpath (' P/text () ') Unicode_teacher_name = T_name.extract_first (). Strip () Unicode_teacher_level = T_level.extract_first (). Strip () Unicode_teacher_info = T_desc.extract_first (). Strip () item["name"] = Unicode_teacher_name item["level"] = Unicode_teacher_level Item     ["info"] = Unicode_teacher_info items.append (item)   return items

After writing is complete, copy to the project's \allister\spiders directory, and cmd Select the project root directory to enter the following command:

Scrapy Crawl Ic2c-o ic2c_infos.json-t JSON

The crawled data is stored in the Ic2c_infos.json file in JSON format;

If the following error appears, please see the corresponding workaround:

Python3.5 Installation & Test Scrapy

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More