Scrapy Windows installation Tutorial Python crawler framework

Source: Internet
Author: User

The system environment used for this installation is Windows XP. The following is a detailed procedure. I think if you do that, you will succeed.

1. Install python2.6. Here is python2.6, why choose this version,

First of all, the Scrapy official website explicitly wrote: requirements:

Python 2.5, 2.6, 2.7 (3.x is not yet supported), which currently only supports python2.5, 2.6,2.7.3 above version of Python is not supported. And I used to use scrapy development of the process to find 2.5 still some bugs, the specific will not say. Http://www.codepub.com/software/Python-12776.html because the Python official website occasionally cannot open (can't open I want you to understand!) ), so give a domestic download link, this link may also be a day can not be used. So the students need to do their own. Install Python, unzip to get the right icon, double-click Install, almost no settings can be successful, that is, if you do not install Python environment, there is no need to look at the back, so Python installation I really want to lazy. But I still want to say that the environment variable setting, in My Computer-"Advanced environment variable path type set just I installed Pyton root file directory, here will C:\Python26 add to the environment variable:, here to install the Python end, in the CMD mode to enter the execution of Python, Creating a similar image below indicates that the Python installation was successful.

  

2. Follow the Python website for installation of twisted.

Twisted installation method, install twisted first need Zope.interface,pyopenssl, these 2 third party package. And through the twisted official online, we can see the download is Zope.interface,pyopenssl and so are egg files, then here we need to setuptools tools first.

1. Download here: Http://pypi.python.org/packages/2.6/s/setuptools/setuptools-0.6c11.win32-py2.6.exe The links I have given are just what I have found I can use, join you do not work, can try to change one, adhering to a principle is that is setuptools tools and for py2.6. , double-click the icon to execute. Then after execution in the Python root directory in the Scripts folder will have easy_install.py and other files, all with easy_install words. Easy_install tool installation is complete.

2.zope_interface installation. Via twised download page: http://twistedmatrix.com/trac/wiki/Downloads Click to go to Zope.interface Pypi.python.org/pypi/zope.interface#download, select the egg that is available for download in the current environment, where we choose Zope.interface-3.6.3-py2.6-win32.egg (MD5) , after the download is such a file, this time to copy the egg file into the Python root we have just said the scripts directory, with Easy_installs and other files a directory location. Then go to cmd mode, enter this script directory in cmd mode, execute easy_install.py Egg file name, execute install this egg files.

Here to check if Zope.interface is installed successfully, execute import zope.interface in the Python environment, add no error, then the Zope.interface installation is correct.

3. As above, perform the installation Pyopenssl. Http://pypi.python.org/pypi/pyOpenSSL here, there are versions of Pyopenssl available for you to choose from. Here we choose Pyopenssl-0.12-py2.6-win32-egg (MD5), download the resulting egg file, press the method that was just installed Zope.interface, in the installation Pyopenssl, first will download to the PYOPENSSL, Copy to the Scripts folder, and then go to cmd mode, enter the corresponding scripts folder in cmd mode, execute Easy_install.exe pyopenssl-0.12-py2.6-win-amd64.egg, and install. For

Verify that the installation was successful: in a Python environment, perform import OpenSSL to see if the import is performing properly. If you do not report one or more errors, the installation is correct.

4. Install twisted. Back to twisted download link: http://twistedmatrix.com/trac/wiki/Downloads, Because what we need here is the corresponding twisted version of python2.6. Here we have selected the second EXE version. After downloading, double click to install. The installation process is performed automatically. So do not do too much to explain, and the possible error is that the version corresponds to the inconsistency, It is because you have not selected the current and your Python version of the Twisted. Here twisted installation is complete, but whether there is any problem, we can not rush to the conclusion, because the current support package has 4 kinds of, respectively, is Setuptools, Zope.interface,pyopenssl,twisted, and is there a  pycrypto 2.0.1 for Python 2.5  in twisted? We did not talk to him, I am here because of the use of the python2.6 version, so the first temporarily ignore him, but can completely ignore him? Because we're not sure what this package does, or if it's in python.26, or if there's Pycrypto 2.0.1 in the twisted that corresponds to the PYTHON26 version. Or a package that substitutes for his role. So we can only say for the time being, in the actual development process if there are any problems in mind.

3. According to Scrapy website, install lxml. The bottom section of Http://doc.scrapy.org/intro/install.html#intro-install in Scrapy is the case of Windows installation. Click here for options on lxml, enter: http://users.skynet.be/sbi/libxml-python/, here we have selected: Second, and libxml for python2.6 and other keywords. After installation, execute import libxml2 in the Python environment, if no error is indicated, it is correct.

  

4. Install Scrapy. Enter Scrapy official website: http://scrapy.org/download/This link, click Scrapy 0.12 on PyPI, notice that there is a parenthesis behind him (include Windows installers), which means that clicking here can also be installed under Windows. Enter http://pypi.python.org/pypi/Scrapy This page, click here about exe format, to download. After the download, you can simply double-click to execute. This time to see if there is a folder for Scrapy in the third-party directory in the Python directory (that is, site-package), and then enter scrapy in any directory in CMD mode, which prompts for an error, It is necessary to set the script directory under the Python root directory to the environment variable. , then reopen a cmd window, execute the scrapy command anywhere, and get the following page to indicate that the environment is configured successfully.

5. About the project, such as crawling Baidu search engine list information it.

1. Create the project.

A. In the cmd window, select a path. Here I chose F:\workspace, a new host project here: Scrapy Startproject Mobile means to create a project with the root directory named Mobile. If the error message is not reported, the project was created successfully. Through the file management, we can clearly see another such a file system has been generated, and in the corresponding folder and corresponding files.

2. Preliminary application

Preliminary crawler Here only write one of the simplest crawler, if you have a difficult problem, you can communicate with me, I will try my best to help you.

1. Create a new file under the Spider folder, which is named baidu.py, and the contents are:

 fromScrapy.spiderImportBasespider

classBaiduspider (Basespider):
Name= "baidu.com"
Allowed_domains= ["baidu.com"]
Start_urls= ["http://www.baidu.com/s?wd=%CA%D6%BB%FA&inputT=2110"]

defParse (self, Response):
FileName=Response.url.split ("/")[-2] + '. html'
Open (filename,'WB'). Write (Response.body)

Then this will generate an HTML file of the www.baidu.com.html file name in the project root directory, in cmd mode, into the project root directory, which is the same directory as Scrapy.cfg, execute scrapy crawl baidu.com Note here that baidu.com is the value of the Name property that corresponds to the Baiduspider class. Get the final if shown:

, eventually we will find the www.baidu.com.html file in the mobile root directory, which will be the corresponding HTML content. This time, let's talk about the configuration of the Linux environment.


Scrapy Windows installation Tutorial Python crawler framework

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.