Install scrapy in centos

Source: Internet
Author: User

Scrapy is an open-source Python standalone crawler with the twisted framework. This crawler actually contains a toolkit for most web crawlers to download and extract.

Installation environment:

 

centos5.4python2.7.3

 

Installation steps:

1. Download The python2.7 http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tgz

[[email protected] ~]# wget http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tgz -P /opt[[email protected] opt]# tar xvf Python-2.7.3.tgz [[email protected] Python-2.7.3]# ./configure [[email protected] Python-2.7.3]# make && make install

Verify python2.7 Installation

[[email protected] Python-2.7.3]# python2.7Python 2.7.3 (default, Feb 28 2013, 03:08:43) [GCC 4.1.2 20080704 (Red Hat 4.1.2-50)] on linux2Type "help", "copyright", "credits" or "license" for more information.>>> exit()

2. Install setuptools and http://pypi.python.org/packages/source/s/setuptools/setuptools-0.6c11.tar.gz

[[email protected] ~]# wget http://pypi.python.org/packages/source/s/setuptools/setuptools-0.6c11.tar.gz -P /opt/[[email protected] opt]# tar zxvf setuptools-0.6c11.tar.gz [[email protected] setuptools-0.6c11]# python2.7 setup.py  install

 

3. Install twisted

[[email protected] setuptools-0.6c11]# easy_install Twisted......Installed /usr/local/lib/python2.7/site-packages/Twisted-12.3.0-py2.7-linux-x86_64.egg......Installed /usr/local/lib/python2.7/site-packages/zope.interface-4.0.4-py2.7-linux-x86_64.egg

To install Zope. Interface on twisted, download it from the address below.

Zope. Interface: http://pypi.python.org/packages/source/z/zope.interface/zope.interface-4.0.1.tar.gz

Twisted: http://twistedmatrix.com/Releases/Twisted/12.1/Twisted-12.1.0.tar.bz2

5. Install w3lib

[[email protected] setuptools-0.6c11]# easy_install -U w3libSearching for w3libReading http://pypi.python.org/simple/w3lib/Reading http://github.com/scrapy/w3libBest match: w3lib 1.2Downloading http://pypi.python.org/packages/source/w/w3lib/w3lib-1.2.tar.gz#md5=f929d5973a9fda59587b09a72f185a9eProcessing w3lib-1.2.tar.gzRunning w3lib-1.2/setup.py -q bdist_egg --dist-dir /tmp/easy_install-wm_1BB/w3lib-1.2/egg-dist-tmp-2DQHY_zip_safe flag not set; analyzing archive contents...Adding w3lib 1.2 to easy-install.pth fileInstalled /usr/local/lib/python2.7/site-packages/w3lib-1.2-py2.7.eggProcessing dependencies for w3libFinished processing dependencies for w3lib

W3lib: http://pypi.python.org/packages/source/w/w3lib/w3lib-1.2.tar.gz

6. Install libxml2 or install lxml with easy_install

[[email protected] lxml-3.1.0]# easy_install lxml

Verify lxml Installation

[[email protected] lxml-3.1.0]# python2.7Python 2.7.3 (default, Feb 28 2013, 03:08:43) [GCC 4.1.2 20080704 (Red Hat 4.1.2-50)] on linux2Type "help", "copyright", "credits" or "license" for more information.>>> import lxml>>> exit()

You can also install libxml2. We recommend that you install version 2.6.28 or later on the official website, but it is not found on the official website. I first installed version 2.6.9 and ran scrapy with the following error:

Traceback (most recent call last):  File "/usr/local/bin/scrapy", line 5, in <module>    pkg_resources.run_script(‘Scrapy==0.14.4‘, ‘scrapy‘)  File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 489, in run_script  File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 1207, in run_script  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/EGG-INFO/scripts/scrapy", line 4, in <module>    execute()  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py", line 112, in execute    cmds = _get_commands_dict(inproject)  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py", line 37, in _get_commands_dict    cmds = _get_commands_from_module(‘scrapy.commands‘, inproject)  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py", line 30, in _get_commands_from_module    for cmd in _iter_command_classes(module):  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py", line 21, in _iter_command_classes    for module in walk_modules(module_name):  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/utils/misc.py", line 65, in walk_modules    submod = __import__(fullpath, {}, {}, [‘‘])  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/commands/shell.py", line 8, in <module>    from scrapy.shell import Shell  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/shell.py", line 14, in <module>    from scrapy.selector import XPathSelector, XmlXPathSelector, HtmlXPathSelector  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/selector/__init__.py", line 30, in <module>    from scrapy.selector.libxml2sel import *  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/selector/libxml2sel.py", line 12, in <module>    from .factories import xmlDoc_from_html, xmlDoc_from_xml  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/selector/factories.py", line 14, in <module>    libxml2.HTML_PARSE_NOERROR + AttributeError: ‘module‘ object has no attribute ‘HTML_PARSE_RECOVER‘

Upgrade to version 2.6.21.

Libxml2.6.1: ftp://xmlsoft.org/libxml2/python/libxml2-python-2.6.21.tar.gz

7. Install pyopenssl (this is an optional installation, mainly to enable scrapy to support https)

The pyOpenSSL-0.13 version was installed with easy_install pyopenssl, but the installation was not successful, so you manually download. 011 for installation.

[[email protected] opt]# wget http://launchpadlibrarian.net/58498441/pyOpenSSL-0.11.tar.gz -P /opt[[email protected] opt]# tar zxvf pyOpenSSL-0.11.tar.gz [[email protected] pyOpenSSL-0.11]# python2.7 setup.py install

Pyopenssl: http://launchpadlibrarian.net/58498441/pyOpenSSL-0.11.tar.gz

8. Install scrapy

[[email protected] pyOpenSSL-0.11]# easy_install -U Scrapy

Verify Installation

[[email protected] pyOpenSSL-0.11]# scrapyScrapy 0.16.4 - no active projectUsage:  scrapy <command> [options] [args]Available commands:  fetch         Fetch a URL using the Scrapy downloader  runspider     Run a self-contained spider (without creating a project)  settings      Get settings values  shell         Interactive scraping console  startproject  Create new project  version       Print Scrapy version  view          Open URL in browser, as seen by Scrapy  [ more ]      More commands available when run from project directoryUse "scrapy <command> -h" to see more info about a command

Scrapy: http://pypi.python.org/packages/source/S/Scrapy/Scrapy-0.14.4.tar.gz

Summary:

Pyopenssl cannot be installed independently. You can also download pyopenssl0.11 for installation, and then use easy_install-u scrapy for full installation.

 

 

Yuanwen: http://www.cnblogs.com/xiaoruoen/archive/2013/02/27/2933854.html

Install scrapy in centos

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.