Centos下安裝Scrapy

來源:互聯網
上載者:User

標籤:blog   http   os   io   使用   ar   strong   for   art   

Scrapy是一個開源的機遇twisted架構的python的單機爬蟲,該爬蟲實際上包含大多數網頁抓取的工具包,用於爬蟲下載端以及抽取端。

安裝環境:

 

centos5.4python2.7.3

 

安裝步驟:

1.下載python2.7  http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tgz

[[email protected] ~]# wget http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tgz -P /opt[[email protected] opt]# tar xvf Python-2.7.3.tgz [[email protected] Python-2.7.3]# ./configure [[email protected] Python-2.7.3]# make && make install

 驗證python2.7安裝

[[email protected] Python-2.7.3]# python2.7Python 2.7.3 (default, Feb 28 2013, 03:08:43) [GCC 4.1.2 20080704 (Red Hat 4.1.2-50)] on linux2Type "help", "copyright", "credits" or "license" for more information.>>> exit()

2.安裝setuptools,http://pypi.python.org/packages/source/s/setuptools/setuptools-0.6c11.tar.gz

[[email protected] ~]# wget http://pypi.python.org/packages/source/s/setuptools/setuptools-0.6c11.tar.gz -P /opt/[[email protected] opt]# tar zxvf setuptools-0.6c11.tar.gz [[email protected] setuptools-0.6c11]# python2.7 setup.py  install

 

3.安裝Twisted

[[email protected] setuptools-0.6c11]# easy_install Twisted......Installed /usr/local/lib/python2.7/site-packages/Twisted-12.3.0-py2.7-linux-x86_64.egg......Installed /usr/local/lib/python2.7/site-packages/zope.interface-4.0.4-py2.7-linux-x86_64.egg

Twisted要安裝zope.interface,可以從下面地址下載

zope.interface:http://pypi.python.org/packages/source/z/zope.interface/zope.interface-4.0.1.tar.gz

twisted:http://twistedmatrix.com/Releases/Twisted/12.1/Twisted-12.1.0.tar.bz2

5.安裝w3lib

[[email protected] setuptools-0.6c11]# easy_install -U w3libSearching for w3libReading http://pypi.python.org/simple/w3lib/Reading http://github.com/scrapy/w3libBest match: w3lib 1.2Downloading http://pypi.python.org/packages/source/w/w3lib/w3lib-1.2.tar.gz#md5=f929d5973a9fda59587b09a72f185a9eProcessing w3lib-1.2.tar.gzRunning w3lib-1.2/setup.py -q bdist_egg --dist-dir /tmp/easy_install-wm_1BB/w3lib-1.2/egg-dist-tmp-2DQHY_zip_safe flag not set; analyzing archive contents...Adding w3lib 1.2 to easy-install.pth fileInstalled /usr/local/lib/python2.7/site-packages/w3lib-1.2-py2.7.eggProcessing dependencies for w3libFinished processing dependencies for w3lib

w3lib:http://pypi.python.org/packages/source/w/w3lib/w3lib-1.2.tar.gz

6.安裝libxml2或者用easy_install安裝lxml

[[email protected] lxml-3.1.0]# easy_install lxml

驗證lxml安裝

[[email protected] lxml-3.1.0]# python2.7Python 2.7.3 (default, Feb 28 2013, 03:08:43) [GCC 4.1.2 20080704 (Red Hat 4.1.2-50)] on linux2Type "help", "copyright", "credits" or "license" for more information.>>> import lxml>>> exit()

也可以安裝libxml2,官網上推薦安裝2.6.28或者以上的版本,但在官網上沒找到,我先是安裝的2.6.9的版本,運行scrapy時報以下錯誤

Traceback (most recent call last):  File "/usr/local/bin/scrapy", line 5, in <module>    pkg_resources.run_script(‘Scrapy==0.14.4‘, ‘scrapy‘)  File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 489, in run_script  File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 1207, in run_script  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/EGG-INFO/scripts/scrapy", line 4, in <module>    execute()  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py", line 112, in execute    cmds = _get_commands_dict(inproject)  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py", line 37, in _get_commands_dict    cmds = _get_commands_from_module(‘scrapy.commands‘, inproject)  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py", line 30, in _get_commands_from_module    for cmd in _iter_command_classes(module):  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py", line 21, in _iter_command_classes    for module in walk_modules(module_name):  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/utils/misc.py", line 65, in walk_modules    submod = __import__(fullpath, {}, {}, [‘‘])  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/commands/shell.py", line 8, in <module>    from scrapy.shell import Shell  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/shell.py", line 14, in <module>    from scrapy.selector import XPathSelector, XmlXPathSelector, HtmlXPathSelector  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/selector/__init__.py", line 30, in <module>    from scrapy.selector.libxml2sel import *  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/selector/libxml2sel.py", line 12, in <module>    from .factories import xmlDoc_from_html, xmlDoc_from_xml  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/selector/factories.py", line 14, in <module>    libxml2.HTML_PARSE_NOERROR + AttributeError: ‘module‘ object has no attribute ‘HTML_PARSE_RECOVER‘

升級到2.6.21版本以後解決了。

libxml2.6.1:ftp://xmlsoft.org/libxml2/python/libxml2-python-2.6.21.tar.gz

7.安裝pyOpenSSL(這個是可選安裝的,主要為了使scrapy能夠支援https)

用easy_install pyOpenSSL安裝的是pyOpenSSL-0.13版本,沒安裝成功,於是手動下載.011版本來進行安裝。

[[email protected] opt]# wget http://launchpadlibrarian.net/58498441/pyOpenSSL-0.11.tar.gz -P /opt[[email protected] opt]# tar zxvf pyOpenSSL-0.11.tar.gz [[email protected] pyOpenSSL-0.11]# python2.7 setup.py install

pyOpenSSL:http://launchpadlibrarian.net/58498441/pyOpenSSL-0.11.tar.gz

8.安裝scrapy

[[email protected] pyOpenSSL-0.11]# easy_install -U Scrapy

驗證安裝

[[email protected] pyOpenSSL-0.11]# scrapyScrapy 0.16.4 - no active projectUsage:  scrapy <command> [options] [args]Available commands:  fetch         Fetch a URL using the Scrapy downloader  runspider     Run a self-contained spider (without creating a project)  settings      Get settings values  shell         Interactive scraping console  startproject  Create new project  version       Print Scrapy version  view          Open URL in browser, as seen by Scrapy  [ more ]      More commands available when run from project directoryUse "scrapy <command> -h" to see more info about a command

scrapy:http://pypi.python.org/packages/source/S/Scrapy/Scrapy-0.14.4.tar.gz

總結:

pyOpenSSL單獨安裝的時候不成功,也可以先下載pyOpenSSL0.11進行安裝,再使用easy_install -U Scrapy進行全程安裝

 

 

yuanwen :::    http://www.cnblogs.com/xiaoruoen/archive/2013/02/27/2933854.html

Centos下安裝Scrapy

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.