How to install scrapy

Source: Internet
Author: User

Currently, scrapy is rarely used in China. In addition to its newer version, scrapy also has many drawbacks. For example, it requires many support packages, these support packages are dependent on each other, causing people to vomit blood when installing him, and vomiting blood is not necessarily the correct result! So today, I need to change my work environment and record some information.

The system environment used for this installation is Windows XP. The procedure is as follows. I think it will be successful if you do it.

1. Install python2.6. here we select python2.6. Why should we choose this version? First, the official scrapy website clearly states: requirements:

Python 2.5, 2.6, 2.7 (3.x is not yet supported), that is, currently only Python 2.6, and 2.7.3 versions are supported. in my previous development process using scrapy, I found that there are still some bugs in 2.5. I will not talk about these bugs for the moment. Http://www.codepub.com/software/Python-12776.html because Python official website occasionally cannot open (cannot open I want you to understand !), Therefore, a Chinese download link may be unavailable for another day. So you need to do it yourself. Install python, decompress the package, and double-click the icon on the right to install python. The installation is almost successful without any setup. That is to say, if your python environment is not installed here, you do not need to check the subsequent installation, so I really want to be lazy about installing python. But I still want to talk about environment variable settings. In my computer-> advanced environment variables, set the path type to the pyton root file directory I just installed. Here we will set C: \ python26 is added to the environment variable: Here Python installation is complete. Input and execute python in cmd mode. A similar image below indicates that python installation is successful.

2. Install twisted on the python official website.

Twisted installation method. To install twisted, Zope. Interface and pyopenssl are required first. These two third-party packages. On the twisted official website, we can see that all the files downloaded are Zope. Interface and pyopenssl are egg files. Here we need setuptools first.

1. here download: http://pypi.python.org/packages/2.6/s/setuptools/setuptools-0.6c11.win32-py2.6.exe these links I give are just what I found currently I can use, add you don't make, can try to change one, adhering to the principle that the setuptools tool is for py2.6 ., Double-click this icon. After execution, there will be easy_install.py and other files in the scripts folder under the python root directory, all with the word easy_install. The easy_install tool is installed successfully.

2. zope_interface installation. download page via twised: http://twistedmatrix.com/trac/wiki/Downloads click to perform Zope. interface, go to the http://pypi.python.org/pypi/zope.interface#download, select the egg that fits the current environment for download, here we choose Zope. interface-3.6.3-py2.6-win32.egg (MD5), download is such a file, then copy this egg file into the scripts directory under the python root directory we just said, the same as easy_instils and other files a directory location. In cmd mode, enter the script directory, execute the easy_install.py egg file name, and install the egg file.

Check whether Zope. interface is successfully installed. Run import Zope. Interface in the python environment. If an error is added, the Zope. interface is correctly installed.

3. Run the install pyopenssl. Here in the http://pypi.python.org/pypi/pyOpenSSL, there are these versions of pyopenssl for you to choose from. Renew the pyOpenSSL-0.12-py2.6-win-amd64.egg for installation. Is

Verify whether the installation is successful: In the python environment, run import OpenSSL to check whether the installation is normal. If no installation error is reported, the installation is correct.

4. Install twisted. Go back to the download link of twisted: idea. Here we select the second EXE version. After downloading the SDK, double-click it to install it. The installation process is automatically executed. Therefore, we will not describe it too much, and the possible error is that the version is inconsistent, because you have not selected the current version of twisted corresponding to your python. the twisted installation is complete here, but whether there are still problems, we cannot rush to the conclusion, because there are already four types of support packages, namely setuptools and Zope. interface, pyopenssl, and twisted. Isn't there another pycrypto 2.0.1 for python 2.5 in twisted? We are not rational about him. I am here using python2.6, so I will ignore him for the moment, but can I ignore him completely? This is because we are not sure about the role of this package, or, in python.26, or whether the twisted corresponding to python26 has pycrypto 2.0.1. Or is used to replace another package. So it can only be said that for the time being, if there are any problems in the actual development process, consider it.

3. Install lxml on scrapy's official website. Install lxml at scrapy's http://doc.scrapy.org/intro/install.html?intro-installat the bottom of the page. Click here on the lxml option, enter: http://users.skynet.be/sbi/libxml-python/, here we selected: Second, and libxml for python2.6 keywords. after installation, execute import libxml2 In the python environment. If no error is reported, it indicates that it is correct.

4. install scrapy. go to scrapy Official Website: http://scrapy.org/download/ this link, click scrapy 0.12 on pypi, pay attention to his back but there are brackets, (include Windows installers), said click here can also be installed in windows. Go to the http://pypi.python.org/pypi/Scrapy page and click here to download the EXE format. After downloading the file, double-click it to execute it. In this case, check whether there are scrapy folders in the third-party directory (site-Package) under the python directory. Then, enter scrapy in any directory in cmd mode, when an error is prompted, you need to set the script directory under the python root directory to the environment variable ., Open a new cmd window and execute the scrapy command at any location. The following page is displayed, indicating that the environment is configured successfully.

5. About projects, such as capturing list information on Baidu search engine.

1. Create a project.

A. Select a path in the CMD window. Here I select F: \ workspace. Here I create a host project: scrapy startproject mobile to create a project. The root directory is mobile ., if no error message is reported, the project is created successfully. Through file management, we can clearly see that another file system has been generated and corresponding files under the corresponding folder.

2. Preliminary Application

The initial crawler only writes one simple crawler here. If you encounter a difficult problem, you can communicate with me and I will do my best to help you.

1. Create a new file in the spider folder named "Baidu. py", and the content in the file is:

from scrapy.spider import BaseSpider

class BaiduSpider(BaseSpider):
name = "baidu.com"
allowed_domains = ["baidu.com"]
start_urls = ["http://www.baidu.com/s?wd=%CA%D6%BB%FA&inputT=2110"]

def parse(self, response):
filename = response.url.split("/")[-2] + '.html'
open(filename, 'wb').write(response.body)

In this case, an HTML file named www.baidu.com.html will be generated in the project root directory. In cmd mode, enter the project root directory, that is, the file with scrapy. run scrapy crawl Baidu.com in the same directory as cfg. Note that Baidu.com corresponds to the name attribute value of the baiduspider class. the final result is as follows:

Finally, we will find the www.baidu.com.html file in the mobileroot Directory, which contains the corresponding HTML content. This is the first time. Let's talk about the configuration in the Linux environment in another day.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.