Python3.5 Installation & Test Scrapy

Source: Internet
Author: User
Tags xpath

1. Introduction

  Scrapy frame structure is clear, based on the twisted of the asynchronous architecture can make full use of computer resources, is the necessary basis for the crawler, this article will introduce the installation of Scrapy.

2, Installation lxml

2.1:https://www.lfd.uci.edu/~gohlke/pythonlibs/#twisted Select the lxml library corresponding to the python3.5

2.2 If the PIP version is too low, first upgrade the PIP:

python-m Pip install-u pip

2.3 Install the lxml library (copy the downloaded library file to the Python installation directory, hold down the SHIFT key and right-click to select "Open command Window Here")

Pip Install LXML-4.1.1-CP35-CP35M-WIN_AMD64.WHL

See the appearance of successfully and other words description by chapter success.

3. Install Twisted Library

3.1 Download Link: https://www.lfd.uci.edu/~gohlke/pythonlibs/#twisted select the corresponding python3.5 library file

  

3.2 Installation

Pip Install TWISTED-17.9.0-CP35-CP35M-WIN_AMD64.WHL

See the appearance of successfully and other words description by chapter success.

4, Installation Scrapy

After the installation of the Twisted Library is successful, the installation of Scrapy is simple, and the command is entered directly in the Command Prompt window:

Pip Install Scrapy

See the appearance of successfully and other words description by chapter success.

5. Scrapy Test

5.1 New Project

Create a new Scrapy crawler project, select Python's working directory (my: H:\PycharmProjects then install the SHIFT key and right-click "Open Command Window Here") and enter the command:

Scrapy Startproject Allister

  

The corresponding directory will generate a directory Allister folder with the following directory structure:

└──allister├──allister│├──__init__.py│├──items.py│├──pipelines.py│├──settings.py│└──spiders└──scrapy.cfg a Brief introduction The role of the file: #-----------------------------------------------Scrapy.cfg: The project's configuration file; allister/: The project's Python module, The code will be referenced from here allister/items.py: Project's Items file allister/pipelines.py: Project's Pipelines file allister/settings.py: Project's setup file allister/ Spiders: Directory for crawler storage

5.2 Modify the allister/items.py file:

#-*-Coding:utf-8-*-# Define Here the models for your scraped items## see documentation in:# https://doc.scrapy.org/en/ Latest/topics/items.htmlimport scrapyclass Allisteritem (scrapy. Item):    name = Scrapy. Field () Level    = Scrapy. Field ()    info = scrapy. Field ()

  

5.3 Writing a file allisterspider.py

#!/usr/bin/env python#-*-coding:utf-8-*-# @File: itcastspider.py# @Author: allister.liu# @Date: 2018/1/18# @Desc : Import scrapyfrom allister.items import Allisteritemclass itcastspider (scrapy. Spider): name = "IC2C" allowed_domains = ["http://www.itcast.cn"] start_urls = ["Http://www.itcast.cn/cha Nnel/teacher.shtml#ac "] def parse (self, response): items = [] for site in Response.xpath ('//div[@clas S= "Li_txt"]: item = Allisteritem () T_name = Site.xpath (' H3/text () ') T_level = Site.xpat             H (' H4/text () ') T_desc = Site.xpath (' P/text () ') Unicode_teacher_name = T_name.extract_first (). Strip () Unicode_teacher_level = T_level.extract_first (). Strip () Unicode_teacher_info = T_desc.extract_first (). Strip () item["name"] = Unicode_teacher_name item["level"] = Unicode_teacher_level Item     ["info"] = Unicode_teacher_info items.append (item)   return items 

  

After writing is complete, copy to the project's \allister\spiders directory, and cmd Select the project root directory to enter the following command:

Scrapy Crawl Ic2c-o ic2c_infos.json-t JSON

The crawled data is stored in the Ic2c_infos.json file in JSON format;

If the following error appears, please see the corresponding workaround:

Python3.5 Installation & Test Scrapy

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.