These days there is a need to implement a crawler, think of the bot immediately thought of python,python related reptile data seems to be particularly numerous. So decided to use Python to implement the crawler, just found that Python has an open source Library scrapy, is used to implement the crawler framework, so decisively adopt this implementation. Install scrapy below and decide to install it under Windows.
Scrapy Introduction
Scrapy is a fast, efficient web crawling python framework. Mainly used for Web crawling & extracting information & formatting data. Often use this to do data mining, testing, testing and so on.
Install the required software
Install Step 1, install the Python website to download python (http://www.python.org/ftp/python/2.7.5/python-2.7.5.msi), double-click the MSI file can be installed directly, the Python path (d:\ python27;d:\python27\scripts;) Adding environment variables
Verify that OK is installed
c:\users\admin> pythonpython 2.7.3 ( default, apr, 23:31:26) [msc v.1500 + bit (Intel)] on win32type "help", "copyright", "credits" or "license "for more information.>>>
2, install Setuptools official website Download setuptools (http://pypi.python.org/pypi/setuptools), can download related ez_ setup.py file, and then execute the file directly to complete the installation automatically: Python ez_setup.py3, install zope.interface download zope.interface pypi.python.org/pypi/zope.interface/) to download the installation file MSI file that corresponds to the Python version, double-click it to complete the installation automatically, verify the installation OK
2.7. 3 (at: +) [MSC v +bit (Intel)] on winimport Zope. interface>>>
4, install Twisted official website download twisted (http://twistedmatrix.com/trac/wiki/Downloads) Download the corresponding version of the MSI file, double-click the direct installation. 5, install W3lib official website Download w3lib (http://pypi.python.org/pypi/w3lib) installation, download w3lib-1.9.0.tar.gz file, unzip,
#进入插件目录并执行命令安装 >d:\python-plugin\w3lib-1.3>python setup.py Install
Verify
D:\python-plugin\w3lib-2.7:(+) [MSC v +bit (Intel)] on win for moreinformation. Import w3lib>>>
6, install LIBXML2 official website download libxml2 (http://users.skynet.be/sbi/libxml-python/) & Download the corresponding Python version of the exe file, double-click 7, Install PYOPENSSL official website Download Pyopenssl (https://pypi.python.org/pypi/pyOpenSSL) & Fool Install 8, install scrapy official website Download scrapy (https:// Pypi.python.org/pypi/scrapy) Installation
#进入scrapy目录并执行安装 >d:\python-plugin\scrapy-0.16.5>python setup.py Install
Verify
D:\python-plugin\scrapy-0.16.5>scrapyscrapy 0.16.5-no Active projectusage:scrapy <command> [options] [args] Available Commands:fetch fetch a URL using the Scrapy downloader runspider Run a self-contained spider (Wit Hout creating a project) settings Get settings values Shell Interactive scraping console StartprojectCreate New Project version Print scrapy version view Open URL in Browser, as seen
by scrapy [More] more commands available if
run from project Directoryuse "scrapy <command> H"
to See more info about a commandd:\python-plugin\scrapy-
0.16.5>
Installation Complete OK