Beginner crawlers We can use the Urllib and URLLIB2 libraries and regular expressions to complete, but there are more powerful tools, crawler framework Scrapy, this installation process is also painstakingly, in this collation as follows.
Windows Platform:
My system is Win7, first of all, you have to have Python, I use 2.7.7 version, Python3 similar, just some source files are different.
Official documents: http://doc.scrapy.org/en/latest/intro/install.html, the most authoritative, the following is my personal experience process.
1. Install Python
Installation process I don't have to say much, I have already installed Python 2.7.7 version, after installation, remember to configure the environment variables, such as my installation on the D Drive, D:\python2.7.7, the following two paths are added to the path variable
1 |
D:\python2. 7.7; D:\python2. 7.7\Scripts |
Once configured, enter python–version on the command line and if there is no error, the installation succeeds
2. Installing Pywin32
Under Windows, you must install PYWIN32, install address: http://sourceforge.net/projects/pywin32/
Download the corresponding version of the Pywin32, directly double-click the installation can be completed after installation verification:
Under the Python command line, enter
Import win32com
If no error is indicated, the installation is successful
3. Install Pip
PIP is the tool used to install other necessary packages, first download get-pip.py
After downloading, select the path of the file and execute the following command
The PIP is installed after the command is executed, and at the same time it installs the Setuptools for you.
Executes on the command line after the installation is finished
If prompted as follows, the installation is successful, if the prompt is not internal or external commands, then check the environment variables are not configured, there are two paths.
4. Installing Pyopenssl
Under Windows, there is no preinstalled Pyopenssl, and Linux is already installed.
Installation Address: Https://launchpad.net/pyopenssl
5. Installing lxml
Lxml's detailed introduction to me, is a library written in Python that allows you to quickly and flexibly process XML
Execute the following command directly
To complete the installation, if you are prompted that the Microsoft Visual C + + library is not installed, click I download the supported libraries.
6. Installing Scrapy
Finally is the exciting moment, the top of the cushion is done, we can finally enjoy the fruit of victory!
Execute the following command
Pip will download additional dependent packages, these will not be installed manually, wait for a while, we are done!
7. Verifying the Installation
Input Scrapy
If you are prompted with the following command, it proves that the installation was successful and if it fails, please check the above steps for any omissions.
Linux Ubuntu Platform:
Linux installation is very simple, only a few commands to execute several
1. Install Python
1 |
sudo apt-get install python2. 7 python2. 7-dev |
2. Install Pip
First download get-pip.py
After downloading, select the path of the file and execute the following command
1 |
sudo python get-pip. PY |
3. Install scrapy directly
Because lxml and OPENSSL are already preloaded under Linux
If you want to verify lxml, you can enter each
The following prompt appears to prove that the installation was successful
1 |
Requirementalready satisfied ( use --upgrade to upgrade) : lxml in /usr/lib/python2. 7/dist-packages |
If you want to verify OpenSSL, enter OpenSSL directly, and if you jump to the OpenSSL command line, the installation succeeds.
Next, you can install scrapy directly
1 |
sudo pip install scrapy |
After the installation is complete, enter scrapy
Note that here Linux does not enter Scrapy,linux is still strictly case-sensitive, thanks to Kamen Children's shoes reminders.
If the following prompt appears, this proves that the installation was successful
1234567891011121314 |
Usage: scrapy <command> [options] [args] Available Commands: Bench Run Quick benchmark test fetch fetch a url using the scrapy downloader runspider Run a self -contained spider (without creating a project) Settings Get settings values Shell Interactive Scraping console startproject Create new project version Print scrapy version View Open URL in Browser, as seen by scrapy [More ] More commands available if run from Project Directory |
As follows
If you have any questions, please leave a message! I wish you a smooth installation of small partners!
Reprint: Quiet Find? Python crawler advanced three scrapy framework installation configuration
Python crawler advanced three scrapy framework installation configuration