Install scrapy in centos

Last Update:2014-09-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Scrapy is an open-source Python standalone crawler with the twisted framework. This crawler actually contains a toolkit for most web crawlers to download and extract.

Installation environment:

centos5.4python2.7.3

Installation steps:

1. Download The python2.7 http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tgz

[[email protected] ~]# wget http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tgz -P /opt[[email protected] opt]# tar xvf Python-2.7.3.tgz [[email protected] Python-2.7.3]# ./configure [[email protected] Python-2.7.3]# make && make install

Verify python2.7 Installation

[[email protected] Python-2.7.3]# python2.7Python 2.7.3 (default, Feb 28 2013, 03:08:43) [GCC 4.1.2 20080704 (Red Hat 4.1.2-50)] on linux2Type "help", "copyright", "credits" or "license" for more information.>>> exit()

2. Install setuptools and http://pypi.python.org/packages/source/s/setuptools/setuptools-0.6c11.tar.gz

[[email protected] ~]# wget http://pypi.python.org/packages/source/s/setuptools/setuptools-0.6c11.tar.gz -P /opt/[[email protected] opt]# tar zxvf setuptools-0.6c11.tar.gz [[email protected] setuptools-0.6c11]# python2.7 setup.py  install

3. Install twisted

[[email protected] setuptools-0.6c11]# easy_install Twisted......Installed /usr/local/lib/python2.7/site-packages/Twisted-12.3.0-py2.7-linux-x86_64.egg......Installed /usr/local/lib/python2.7/site-packages/zope.interface-4.0.4-py2.7-linux-x86_64.egg

To install Zope. Interface on twisted, download it from the address below.

Zope. Interface: http://pypi.python.org/packages/source/z/zope.interface/zope.interface-4.0.1.tar.gz

Twisted: http://twistedmatrix.com/Releases/Twisted/12.1/Twisted-12.1.0.tar.bz2

5. Install w3lib

[[email protected] setuptools-0.6c11]# easy_install -U w3libSearching for w3libReading http://pypi.python.org/simple/w3lib/Reading http://github.com/scrapy/w3libBest match: w3lib 1.2Downloading http://pypi.python.org/packages/source/w/w3lib/w3lib-1.2.tar.gz#md5=f929d5973a9fda59587b09a72f185a9eProcessing w3lib-1.2.tar.gzRunning w3lib-1.2/setup.py -q bdist_egg --dist-dir /tmp/easy_install-wm_1BB/w3lib-1.2/egg-dist-tmp-2DQHY_zip_safe flag not set; analyzing archive contents...Adding w3lib 1.2 to easy-install.pth fileInstalled /usr/local/lib/python2.7/site-packages/w3lib-1.2-py2.7.eggProcessing dependencies for w3libFinished processing dependencies for w3lib

W3lib: http://pypi.python.org/packages/source/w/w3lib/w3lib-1.2.tar.gz

6. Install libxml2 or install lxml with easy_install

[[email protected] lxml-3.1.0]# easy_install lxml

Verify lxml Installation

[[email protected] lxml-3.1.0]# python2.7Python 2.7.3 (default, Feb 28 2013, 03:08:43) [GCC 4.1.2 20080704 (Red Hat 4.1.2-50)] on linux2Type "help", "copyright", "credits" or "license" for more information.>>> import lxml>>> exit()

You can also install libxml2. We recommend that you install version 2.6.28 or later on the official website, but it is not found on the official website. I first installed version 2.6.9 and ran scrapy with the following error:

Traceback (most recent call last):  File "/usr/local/bin/scrapy", line 5, in <module>    pkg_resources.run_script(‘Scrapy==0.14.4‘, ‘scrapy‘)  File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 489, in run_script  File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 1207, in run_script  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/EGG-INFO/scripts/scrapy", line 4, in <module>    execute()  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py", line 112, in execute    cmds = _get_commands_dict(inproject)  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py", line 37, in _get_commands_dict    cmds = _get_commands_from_module(‘scrapy.commands‘, inproject)  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py", line 30, in _get_commands_from_module    for cmd in _iter_command_classes(module):  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py", line 21, in _iter_command_classes    for module in walk_modules(module_name):  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/utils/misc.py", line 65, in walk_modules    submod = __import__(fullpath, {}, {}, [‘‘])  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/commands/shell.py", line 8, in <module>    from scrapy.shell import Shell  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/shell.py", line 14, in <module>    from scrapy.selector import XPathSelector, XmlXPathSelector, HtmlXPathSelector  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/selector/__init__.py", line 30, in <module>    from scrapy.selector.libxml2sel import *  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/selector/libxml2sel.py", line 12, in <module>    from .factories import xmlDoc_from_html, xmlDoc_from_xml  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/selector/factories.py", line 14, in <module>    libxml2.HTML_PARSE_NOERROR + AttributeError: ‘module‘ object has no attribute ‘HTML_PARSE_RECOVER‘

Upgrade to version 2.6.21.

Libxml2.6.1: ftp://xmlsoft.org/libxml2/python/libxml2-python-2.6.21.tar.gz

7. Install pyopenssl (this is an optional installation, mainly to enable scrapy to support https)

The pyOpenSSL-0.13 version was installed with easy_install pyopenssl, but the installation was not successful, so you manually download. 011 for installation.

[[email protected] opt]# wget http://launchpadlibrarian.net/58498441/pyOpenSSL-0.11.tar.gz -P /opt[[email protected] opt]# tar zxvf pyOpenSSL-0.11.tar.gz [[email protected] pyOpenSSL-0.11]# python2.7 setup.py install

Pyopenssl: http://launchpadlibrarian.net/58498441/pyOpenSSL-0.11.tar.gz

8. Install scrapy

[[email protected] pyOpenSSL-0.11]# easy_install -U Scrapy

Verify Installation

[[email protected] pyOpenSSL-0.11]# scrapyScrapy 0.16.4 - no active projectUsage:  scrapy <command> [options] [args]Available commands:  fetch         Fetch a URL using the Scrapy downloader  runspider     Run a self-contained spider (without creating a project)  settings      Get settings values  shell         Interactive scraping console  startproject  Create new project  version       Print Scrapy version  view          Open URL in browser, as seen by Scrapy  [ more ]      More commands available when run from project directoryUse "scrapy <command> -h" to see more info about a command

Scrapy: http://pypi.python.org/packages/source/S/Scrapy/Scrapy-0.14.4.tar.gz

Summary:

Pyopenssl cannot be installed independently. You can also download pyopenssl0.11 for installation, and then use easy_install-u scrapy for full installation.

Yuanwen: http://www.cnblogs.com/xiaoruoen/archive/2013/02/27/2933854.html

Install scrapy in centos

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Install scrapy in centos

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Install scrapy in centos

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support