I. Introduction of Scrapy
Scrapy is a fast high-level screens scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used-a wide range of purposes, from the data mining to monitoring and automated testing.
Official homepage: http://www.scrapy.org/
Second, installation Python2.7
Official homepage: http://www.python.org/
: Http://www.python.org/ftp/python/2.7.3/python-2.7.3.msi
1) install Python
Installation directory: D:\Python27
2) Add environment variables
Environment Variables->system Variables, Path, System Properties, Advanced, Edit
3) Verifying environment variables
T:\>set Pathpath=c:\windows\system32; C:\WINDOWS; c:\windows\system32\wbem;d:\rational\common;d:\rational\clearcase\bin;d:\python27;d:\python27\scriptspathext=. COM;. EXE;. BAT;. CMD;. VBS;. VBE;. JS;. JSE;. WSF;. WSH
4) verifying python
T:\>pythonpython 2.7.3 (default, APR, 23:31:26) [MSC v.1500 + bit (Intel)] on Win32type "help", "copyright", " Credits "or" license "for more information.>>> exit () t:\>
Third, installation twisted
Twisted is a Event-driven networking engine written in Python and licensed under the open source
1) install Setuptools
Download, build, install, upgrade, and uninstall Python packages--easily!
Official homepage: Http://pypi.python.org/pypi/setuptools
: Http://pypi.python.org/packages/2.7/s/setuptools/setuptools-0.6c11.win32-py2.7.exe
Installation process: slightly
2) install Zope.interface
Official homepage: http://pypi.python.org/pypi/zope.interface/
: Http://pypi.python.org/packages/2.7/z/zope.interface/zope.interface-4.0.1-py2.7-win32.egg
Installation process:
T:\>D:D:\>CD D:\python27\scriptsd:\python27\scripts>easy_install.exe Zope.interface-4.0.1-py2.7-win32.eggprocessing zope.interface-4.0.1-py2.7-win32.eggcreating d:\python27\lib\ Site-packages\zope.interface-4.0.1-py2.7-win32.eggextracting Zope.interface-4.0.1-py2.7-win32.egg to D:\python27\ Lib\site-packagesadding Zope.interface 4.0.1 to easy-install.pth fileinstalled d:\python27\lib\site-packages\ zope.interface-4.0.1-py2.7-win32.eggprocessing Dependencies for zope.interface==4.0.1finished processing Dependencies for Zope.interface==4.0.1d:\python27\scripts>
To verify the installation:
D:\python27\scripts>pythonpython 2.7.3 (default, APR, 23:31:26) [MSC v.1500 + bit (Intel)] on Win32type ' help ' , "copyright", "credits" or "license" for more information.>>> import zope.interface>>>
3) install twisted
Official homepage: Http://twistedmatrix.com/trac/wiki/TwistedProject
: Http://pypi.python.org/packages/2.7/T/Twisted/Twisted-12.1.0.win32-py2.7.msi
Installation process: slightly
Iv. installation of W3lib
Official homepage: http://pypi.python.org/pypi/w3lib
: http://pypi.python.org/packages/source/w/w3lib/w3lib-1.2.tar.gz
Decompression process: slightly
Installation process:
T:\w3lib-1.2>python setup.py installrunning installrunning buildrunning build_pycreating buildcreating build\ Libcreating build\lib\w3libcopying w3lib\encoding.py, build\lib\w3libcopying w3lib\form.py-build\lib\ W3libcopying w3lib\html.py, build\lib\w3libcopying w3lib\http.py, build\lib\w3libcopying w3lib\url.py Build\lib\w3libcopying w3lib\util.py, build\lib\w3libcopying w3lib\__init__.py, build\lib\w3librunning Install_libcreating D:\Python27\Lib\site-packages\w3libcopying build\lib\w3lib\encoding.py, D:\Python27\Lib\ Site-packages\w3libcopying build\lib\w3lib\form.py, D:\Python27\Lib\site-packages\w3libcopying build\lib\ w3lib\html.py, D:\Python27\Lib\site-packages\w3libcopying build\lib\w3lib\http.py, D:\Python27\Lib\ Site-packages\w3libcopying build\lib\w3lib\url.py, D:\Python27\Lib\site-packages\w3libcopying build\lib\w3lib \util.py, D:\Python27\Lib\site-packages\w3libcopying build\lib\w3lib\__init__.py, D:\PythOn27\lib\site-packages\w3libbyte-compiling D:\Python27\Lib\site-packages\w3lib\encoding.py to Encoding.pycbyte-compiling D:\Python27\Lib\site-packages\w3lib\form.py to Form.pycbyte-compiling D:\Python27\Lib\ site-packages\w3lib\html.py to html.pycbyte-compiling D:\Python27\Lib\site-packages\w3lib\http.py to Http.pycbyte-compiling D:\Python27\Lib\site-packages\w3lib\url.py to Url.pycbyte-compiling D:\Python27\Lib\ site-packages\w3lib\util.py to util.pycbyte-compiling D:\Python27\Lib\site-packages\w3lib\__init__.py to __init__. Pycrunning install_egg_infowriting d:\python27\lib\site-packages\w3lib-1.2-py2.7.egg-infot:\w3lib-1.2>
To verify the installation:
V. Installation of LIBXML2
Official homepage: Http://users.skynet.be/sbi/libxml-python/http://pypi.python.org/pypi/pyOpenSSL
: Http://users.skynet.be/sbi/libxml-python/binaries/libxml2-python-2.7.7.win32-py2.7.exe
Installation process: slightly
To verify the installation:
Vi. installation of Pyopenssl
Official homepage: Http://pypi.python.org/pypi/pyOpenSSL
: Http://pypi.python.org/packages/2.7/p/pyOpenSSL/pyOpenSSL-0.13.winxp32-py2.7.msi
Installation process: slightly
To verify the installation:
T:\>pythonpython 2.7.3 (default, APR, 23:31:26) [MSC v.1500 + bit (Intel)] on Win32type "help", "copyright", " Credits "or" license "for more information.>>> import openssl>>>
Vii. installation of Scrapy
Official homepage: http://scrapy.org/
: http://pypi.python.org/packages/source/S/Scrapy/Scrapy-0.14.4.tar.gz
Decompression process: slightly
Installation process:
T:\scrapy-0.14.4>python setup.py Install ... Installing easy_install-2.7-script.py script to D:\Python27\ScriptsInstalling easy_install-2.7.exe script to D:\ python27\scriptsinstalling easy_install-2.7.exe.manifest script to D:\Python27\ScriptsUsing d:\python27\lib\ site-packagesfinished processing dependencies for scrapy==0.14.4t:\scrapy-0.14.4>
Add variable to path i.e. "D:\Python27\Scripts"
To verify the installation:
t:\>scrapyscrapy 0.14.4-no Active projectusage:scrapy <command> [options] [args]available Commands:fet Ch Fetch a URL using the Scrapy downloader runspider Run a self-contained spider (without creating a project) Settings Get settings Values Shell Interactive scraping console startproject Create new project version Print scrapy version view Open URL in Browser, as seen by Scrapyuse "Scrapy <command> h" to see Mo Re info about a commandt:\>