Python crawler Framework scrapy installation Steps

Python crawler Framework scrapy installation Steps _python

Last Update:2017-01-19 Source: Internet

Author: User

Tags curl openssl in python

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Introduction to the Scarpy of reptile frame
Scrapy is a fast, high-level screen crawl and web crawler framework, crawling Web sites, getting structured data from Web pages, and it has a wide range of uses, from data mining to monitoring and automated testing, scrapy fully implemented in Python, fully open source, code hosted on GitHub, Can run on the Linux,windows,mac and BSD platform, based on twisted asynchronous network library to handle network communications, users need to customize the development of a few modules can easily implement a crawler, to crawl Web content and a variety of pictures.

Second, scrapy Installation Guide

Our installation steps assume that you have installed the content: <1>python2.7<2>lxml<3>openssl, we use the Python Package management tool PIP or Easy_install to install Scrapy.
How the PIP is installed:

Copy Code code as follows:

Pip Install Scrapy

How to install Easy_install:

Copy Code code as follows:

Easy_install Scrapy

Third, the environment configuration on the Ubuntu platform

1, Python's package management tools
The current package management tool chain is EASY_INSTALL/PIP + distribute/setuptools
Distutils:python's own Basic installation tool, suitable for very simple application scenarios;
Setuptools: A lot of extensions have been made for distutils, especially with the package dependencies mechanism. Part of the Python community is already the de facto standard;
Distribute: Due to slow progress in setuptools development, no support for Python 3, code chaos, a bunch of programmers to rebuild the code, add functionality, hope to replace the setuptools and be accepted as the official standard library, they work very hard, in a very short time Then let the community accept the Distribute;,setuptools/distribute are only extended distutils;
Easy_install:setuptools and distribute's own installation script, that is, once setuptools or distribute is installed, Easy_install is available. The biggest feature is automatically find Python official maintenance of the package source PyPI, the installation of Third-party Python package is very convenient; Use:
Pip:pip's goal is very clear – replace Easy_install. Easy_install has many disadvantages: The installation transaction is atomic, only SVN is supported, no uninstall command is provided, and a script is required to install a series of packages; PIP solves the above problem and has become a new fact standard, virtualenv and it has become a pair of good partners;

Installation process:
Install Distribute

Copy Code code as follows:

$ Curl-o http://python-distribute.org/distribute_setup.py
$ python distribute_setup.py

Install PIP:

Copy Code code as follows:

$ Curl-o https://raw.github.com/pypa/pip/master/contrib/get-pip.py
$ [sudo] python get-pip.py

2, the installation of Scrapy
On the Windows platform, you can download a variety of dependent binaries via package management tools or manually: Pywin32,twisted, Zope.interface,lxml,pyopenssl, in the Ubuntu9.10 version, the official recommendation is not to use the Python-scrapy package provided by Ubuntu, they are either too old or too slow to match the latest scrapy, the solution is to use the official Ubunt U Packages, which provides all dependent libraries, and provides continuous updates and stability for the latest bugs, and they continue to be built from GitHub warehouses (master and stable branches). Scrapy the installation method on the version after Ubuntu9.10 is as follows:
<1> Enter GPG key

Copy Code code as follows:

sudo apt-key adv--keyserver hkp://keyserver.ubuntu.com:80--recv 627220E7

<2> Create/etc/apt/sources.list.d/scrapy.list files

Copy Code code as follows:

Echo ' Deb Http://archive.scrapy.org/ubuntu scrapy main ' | sudo tee/etc/apt/sources.list.d/scrapy.list

<3> Update package list, install Scrapy version, where version replaces with actual version, such as scrapy-0.22

Copy Code code as follows:

sudo apt-get update && sudo apt-get install scrapy-version

3, scrapy dependent library installation
Installation of Scrapy dependent libraries under ubuntu12.04
Importerror:no module named W3lib.http

Copy Code code as follows:

Pip Install W3lib

Importerror:no module named Twisted

Copy Code code as follows:

Pip Install twisted

Importerror:no module named lxml.html

Copy Code code as follows:

Pip Install lxml

Resolved: Error:libxml/xmlversion.h:no such file or directory

Copy Code code as follows:

Apt-get Install Libxml2-dev Libxslt-dev
Apt-get Install Python-lxml

Solution: Importerror:no module named Cssselect

Copy Code code as follows:

Pip Install Cssselect

Importerror:no module named OpenSSL

Copy Code code as follows:

Pip Install Pyopenssl

4, custom development of their own reptiles
Switch to file directory to open new project

Copy Code code as follows:

Scrapy Startproject Test

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More