Python crawler Framework Scrapy installation use steps

Last Update:2016-03-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, the crawler frame Scarpy Introduction
Scrapy is a fast, high-level screen crawl and web crawler framework that crawls Web sites, gets structured data from Web pages, has a wide range of uses, from data mining to monitoring and automated testing, scrapy fully implemented in Python, fully open source, and code hosted on GitHub, Can run on the Linux,windows,mac and BSD platform, based on the Twisted asynchronous network library to deal with network communications, users only need to customize the development of a few modules can easily implement a crawler, to crawl Web content and various images.

Second, scrapy Installation Guide

Our installation steps assume that you have installed the content: <1>python2.7<2>lxml<3>openssl, we use Python's package management tool PIP or Easy_install to install the scrapy.
How PIP is installed:

Copy CodeThe code is as follows: Pip install Scrapy

How to install Easy_install:

Copy CodeThe code is as follows: Easy_install scrapy

Third, the environment configuration on Ubuntu platform

1. Python's Package management tool
The current package management tool chain is EASY_INSTALL/PIP + distribute/setuptools
Distutils:python's own Basic installation tool, suitable for very simple application scenarios;
Setuptools: A large number of extensions have been made for distutils, especially the packet dependency mechanism. In part of the Python community is already the de facto standard;
Distribute: Due to the slow development of Setuptools, Python 3 is not supported, code is confusing, a bunch of programmers reinvent the line, refactor code, add functionality, hope to replace Setuptools and be accepted as the official standard library, they work very hard, in a short time Let the community accept the Distribute;,setuptools/distribute all just expand the distutils;
Easy_install:setuptools and distribute have their own installation scripts, that is, once the setuptools or distribute installation is complete, Easy_install will be available. The biggest feature is the automatic discovery of Python's officially maintained package source PyPI, the installation of third-party Python package is very convenient; Use:
Pip:pip's goal is very clear – to replace Easy_install. Easy_install has many shortcomings: the installation transaction is non-atomic operation, only support SVN, no uninstall command, install a series of packages need to write scripts; Pip solves the above problems, has become a new fact standard, virtualenv and it has become a good pair of partners;

Installation process:
Installing distribute

Copy CodeThe code is as follows: $ Curl-o http://python-distribute.org/distribute_setup.py
$ python distribute_setup.py

Install PIP:

Copy CodeThe code is as follows: $ Curl-o https://raw.github.com/pypa/pip/master/contrib/get-pip.py
$ [sudo] python get-pip.py

2, the installation of Scrapy
On the Windows platform, you can download a variety of dependent binary packages via the package management tool or manually: Pywin32,twisted, Zope.interface,lxml,pyopenssl, in the later version of Ubuntu9.10, the official recommendations do not use Ubuntu provided Python-scrapy package, they are either too old or too slow to match the latest scrapy, the solution is to use the official Ubunt U Packages, which provides all of the dependent libraries, and provides ongoing updates for the latest bugs, is more stable, they continue to build from the GitHub repository (master and stable branches), Scrapy the installation method on the Ubuntu9.10 version is as follows:
<1> Enter GPG key

Copy CodeThe code is as follows: sudo apt-key adv--keyserver hkp://keyserver.ubuntu.com:80--recv 627220E7

<2> Create a/etc/apt/sources.list.d/scrapy.list file

Copy CodeThe code looks like this: Echo ' Deb Http://archive.scrapy.org/ubuntu scrapy main ' | sudo tee/etc/apt/sources.list.d/scrapy.list

<3> Update package list, install the scrapy version, where version is replaced with the actual version, such as scrapy-0.22

Copy CodeThe code is as follows: sudo apt-get update && sudo apt-get install scrapy-version

3, Scrapy rely on the installation of the library
Installation of Scrapy dependent libraries under ubuntu12.04
Importerror:no module named W3lib.http

Copy CodeThe code is as follows: Pip install W3lib

Importerror:no module named Twisted

Copy CodeThe code is as follows: Pip install twisted

Importerror:no module named lxml.html

Copy CodeThe code is as follows: Pip install lxml

FIX: Error:libxml/xmlversion.h:no such file or directory

Copy CodeThe code is as follows: Apt-get install Libxml2-dev Libxslt-dev
Apt-get Install Python-lxml

Solution: Importerror:no module named Cssselect

Copy CodeThe code is as follows: Pip install Cssselect

Importerror:no module named OpenSSL

Copy CodeThe code is as follows: Pip install Pyopenssl

4, customized their own crawler development
Switch to the file directory to open a new project

Copy CodeThe code is as follows: Scrapy startproject test

Python crawler Framework Scrapy installation use steps

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python crawler Framework Scrapy installation use steps

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python crawler Framework Scrapy installation use steps

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support