1. What is Scrapy?
A: Scrapy is an open-source web crawler framework written using the Python language (based on the Twistec framework), with a clear structure, a low degree of coupling between modules, and a strong expansion capability to meet a variety of needs. (we introduced the use of requests, beautifulsoup, selenium and so on is equivalent to your writing topic, mainly aimed at the personal crawler, and the emergence of the scrapy framework gives us a convenient and flexible crawler architecture, we only need to make changes to the components therein, Can achieve a perfect web crawler, equivalent to you do fill in the blanks! )
Based on the ease of use of the scrapy, all of the following scrapy programs will be run under the Linux system
2. Installation of the Scrapy framework (I am using a VMware virtual machine +ubuntu16.04 mirroring Environment)
Open terminal: sudo apt-get install span class= "n" >python-dev python- pip libxml2-dev libxslt1 -dev zlib1g-dev Libffi-dev libssl-dev (installs some dependency packages)
< span class= "n" > < Span class= "o" > If you did not install Python3 do: sudo apt -get install python3 python3-dev
Here the small partners can first create a virtual environment: PIP3 Install virtualenv and then Scrapy installation (all the programs you write will run in the virtual environment)
Based on the ubuntu16.04 version I'm using, the system comes with a version of python2.7.14 and python3.5.2 two
Let's go ahead and solve a multi-version coexistence problem, boys.
When you enter Python, the system will automatically point to Python2, and all our programs are based on Python3, which is the mainstream of the future. (We want to enter Python, the system is directly linked to Python3)
Let's fix this problem: sudo su (input you set to wait for user password to enter Superuser privileges)---then look at the graph:
Analysis: (Linux command Small partners we'll talk later)
The same is true when we type the Python2 system to automatically point to the Python2 environment, Python3
Whereis Python finds the path to all of Python's executable files
Which Python finds out the path of the file that executes when we type Python
We used RM to remove the path first, then use the LN-S parameter 1 parameter 2 (parameter 1 to the parameter 2 is equivalent to generating a soft link principle and hyperlink, when you type Python, the system automatically points to the path of the soft connection Python3 executable file and execute the file), So we managed to achieve the desired goal.
3. How to solve the problem of using multiple versions of Python at the same time and using multiple library versions simultaneously
Answer: Install VIRTUALENV virtual Environment
Open Terminal: sudo pip3 install virtualenv
If the following error occurs, use VI/USR/BIN/PIP3 to change the configuration file (this is because we are python2 pip when you upgrade the system does not change the configuration file, the small partners do not be nervous, we can modify it ourselves)
Here is a powerful text editor for Linux under the use of vim we'll explain next time
Change the configuration file as follows:
Type again: sudo pip3 install virtualenv (Success)
Then:
Create a python3.5 virtual environment called Course-python3.5-env:
Activating and launching virtual environments source and Deactivate commands
Finally we follow the above, first activate the virtual environment, and then install Scrapy can
Verification: Terminal type: Scrapy--version view the installed version of Scapy, no error can be!
In the future, all of our scrapy crawler projects are running in a virtual environment!
Scrapy crawler framework first talk (Linux environment)