Python crawler Framework scrapy installation Steps _python

Source: Internet
Author: User
Tags curl openssl in python

Introduction to the Scarpy of reptile frame
Scrapy is a fast, high-level screen crawl and web crawler framework, crawling Web sites, getting structured data from Web pages, and it has a wide range of uses, from data mining to monitoring and automated testing, scrapy fully implemented in Python, fully open source, code hosted on GitHub, Can run on the Linux,windows,mac and BSD platform, based on twisted asynchronous network library to handle network communications, users need to customize the development of a few modules can easily implement a crawler, to crawl Web content and a variety of pictures.

Second, scrapy Installation Guide

Our installation steps assume that you have installed the content: <1>python2.7<2>lxml<3>openssl, we use the Python Package management tool PIP or Easy_install to install Scrapy.
How the PIP is installed:

Copy Code code as follows:
Pip Install Scrapy

How to install Easy_install:
Copy Code code as follows:
Easy_install Scrapy

Third, the environment configuration on the Ubuntu platform

1, Python's package management tools
The current package management tool chain is EASY_INSTALL/PIP + distribute/setuptools
Distutils:python's own Basic installation tool, suitable for very simple application scenarios;
Setuptools: A lot of extensions have been made for distutils, especially with the package dependencies mechanism. Part of the Python community is already the de facto standard;
Distribute: Due to slow progress in setuptools development, no support for Python 3, code chaos, a bunch of programmers to rebuild the code, add functionality, hope to replace the setuptools and be accepted as the official standard library, they work very hard, in a very short time Then let the community accept the Distribute;,setuptools/distribute are only extended distutils;
Easy_install:setuptools and distribute's own installation script, that is, once setuptools or distribute is installed, Easy_install is available. The biggest feature is automatically find Python official maintenance of the package source PyPI, the installation of Third-party Python package is very convenient; Use:
Pip:pip's goal is very clear – replace Easy_install. Easy_install has many disadvantages: The installation transaction is atomic, only SVN is supported, no uninstall command is provided, and a script is required to install a series of packages; PIP solves the above problem and has become a new fact standard, virtualenv and it has become a pair of good partners;

Installation process:
Install Distribute

Copy Code code as follows:
$ Curl-o http://python-distribute.org/distribute_setup.py
$ python distribute_setup.py

Install PIP:
Copy Code code as follows:
$ Curl-o https://raw.github.com/pypa/pip/master/contrib/get-pip.py
$ [sudo] python get-pip.py

2, the installation of Scrapy
On the Windows platform, you can download a variety of dependent binaries via package management tools or manually: Pywin32,twisted, Zope.interface,lxml,pyopenssl, in the Ubuntu9.10 version, the official recommendation is not to use the Python-scrapy package provided by Ubuntu, they are either too old or too slow to match the latest scrapy, the solution is to use the official Ubunt U Packages, which provides all dependent libraries, and provides continuous updates and stability for the latest bugs, and they continue to be built from GitHub warehouses (master and stable branches). Scrapy the installation method on the version after Ubuntu9.10 is as follows:
<1> Enter GPG key

Copy Code code as follows:
sudo apt-key adv--keyserver hkp://keyserver.ubuntu.com:80--recv 627220E7

<2> Create/etc/apt/sources.list.d/scrapy.list files
Copy Code code as follows:
Echo ' Deb Http://archive.scrapy.org/ubuntu scrapy main ' | sudo tee/etc/apt/sources.list.d/scrapy.list

<3> Update package list, install Scrapy version, where version replaces with actual version, such as scrapy-0.22
Copy Code code as follows:
sudo apt-get update && sudo apt-get install scrapy-version

3, scrapy dependent library installation
Installation of Scrapy dependent libraries under ubuntu12.04
Importerror:no module named W3lib.http

Copy Code code as follows:
Pip Install W3lib

Importerror:no module named Twisted
Copy Code code as follows:
Pip Install twisted

Importerror:no module named lxml.html
Copy Code code as follows:
Pip Install lxml

Resolved: Error:libxml/xmlversion.h:no such file or directory

Copy Code code as follows:
Apt-get Install Libxml2-dev Libxslt-dev
Apt-get Install Python-lxml

Solution: Importerror:no module named Cssselect

Copy Code code as follows:
Pip Install Cssselect

Importerror:no module named OpenSSL
Copy Code code as follows:
Pip Install Pyopenssl

4, custom development of their own reptiles
Switch to file directory to open new project

Copy Code code as follows:
Scrapy Startproject Test

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.