I. Introduction of Scrapy
Scrapy is an application framework written to crawl Web site data to extract structural data that can be applied to data mining, information processing, or storing historical data in some of the columns of the program. Scrapy was originally designed to crawl the web. Now, Scrapy has launched the python3.x version that has been promised.
Why study Scrapy? It can be better for us to complete the crawler task, our own Python crawler program is like a lone, and the use of scrapy is a hand under a mighty army. Scrapy can play a multiplier effect (even several times * *). Therefore, learning Scrapy is also very necessary.
Second, scrapy installation
1. Direct use of the command PIP3 install scrapy, found that there are many errors.
- Failed Building Wheel for lxml
- Microsoft Visual C + + 10.0 is required
- Failed Building Twisted
- Unable to find Vcvarsall.bat
Error encountered, as shown in:
2. Workaround
There are a lot of compiled Python third-party libraries for windows in http://www.lfd.uci.edu/~gohlke/pythonlibs/, so we can download a library that corresponds to our own Python version.
(1) Enter the command python in cmd and view the Python version as follows:
As you can see, my Python version is python3.5.2-64bit.
(2) Login Http://www.lfd.uci.edu/~gohlke/pythonlibs/,Ctrl+F search lxml, Twisted, scrapy, download the corresponding version, For example: LXML-3.7.3-CP35-CP35M-WIN_ADM64.WHL, which indicates that the version of lxml is 3.7.3, the corresponding Python version is 3.5-64bit. The version I downloaded is as follows:
(3) Enter the DOS command in CMD and go to the downloaded WHL folder, for example, my three WHL files are placed under the Scrapy folder:
(4) Execute the following command in turn:
A.PIP3 Install Wheel
B.PIP3 Install LXML-3.7.3-CP35-CP35M-WIN_AMD64.WHL
C.PIP3 Install TWISTED-17.1.0-CP35-CP35M-WIN_AMD64.WHL
D.PIP3 Install SCRAPY-1.3.2-PY2.PY3-NONE-ANY.WHL
This scrapy installation is complete, please ignore the last two lines let me upgrade pip information. *.*
(5) Srapy has been installed successfully, but also to download Pywin32, find the corresponding version of the download, the next step to install. Once the installation is complete, you can use the scrapy normally.
url:https://sourceforge.net/projects/pywin32/files/pywin32/build%20220/
Now that we are done, we can use scrapy happily.
Python3 Network Crawler (v): Python3 installation scrapy