Scrapy crawler framework first talk (Linux environment)

Source: Internet
Author: User
Tags virtual environment virtualenv

1. What is Scrapy?

A: Scrapy is an open-source web crawler framework written using the Python language (based on the Twistec framework), with a clear structure, a low degree of coupling between modules, and a strong expansion capability to meet a variety of needs. (we introduced the use of requests, beautifulsoup, selenium and so on is equivalent to your writing topic, mainly aimed at the personal crawler, and the emergence of the scrapy framework gives us a convenient and flexible crawler architecture, we only need to make changes to the components therein, Can achieve a perfect web crawler, equivalent to you do fill in the blanks! )

Based on the ease of use of the scrapy, all of the following scrapy programs will be run under the Linux system

2. Installation of the Scrapy framework (I am using a VMware virtual machine +ubuntu16.04 mirroring Environment)

Open terminal: sudo apt-get install span class= "n" >python-dev python- pip libxml2-dev libxslt1 -dev zlib1g-dev Libffi-dev libssl-dev (installs some dependency packages)

< span class= "n" > < Span class= "o" > If you did not install Python3 do: sudo apt -get install python3 python3-dev

Here the small partners can first create a virtual environment: PIP3 Install virtualenv and then Scrapy installation (all the programs you write will run in the virtual environment)

Based on the ubuntu16.04 version I'm using, the system comes with a version of python2.7.14 and python3.5.2 two

Let's go ahead and solve a multi-version coexistence problem, boys.

When you enter Python, the system will automatically point to Python2, and all our programs are based on Python3, which is the mainstream of the future. (We want to enter Python, the system is directly linked to Python3)

Let's fix this problem: sudo su (input you set to wait for user password to enter Superuser privileges)---then look at the graph:

Analysis: (Linux command Small partners we'll talk later)

The same is true when we type the Python2 system to automatically point to the Python2 environment, Python3

Whereis Python finds the path to all of Python's executable files

Which Python finds out the path of the file that executes when we type Python

We used RM to remove the path first, then use the LN-S parameter 1 parameter 2 (parameter 1 to the parameter 2 is equivalent to generating a soft link principle and hyperlink, when you type Python, the system automatically points to the path of the soft connection Python3 executable file and execute the file), So we managed to achieve the desired goal.

3. How to solve the problem of using multiple versions of Python at the same time and using multiple library versions simultaneously

Answer: Install VIRTUALENV virtual Environment

Open Terminal: sudo pip3 install virtualenv

If the following error occurs, use VI/USR/BIN/PIP3 to change the configuration file (this is because we are python2 pip when you upgrade the system does not change the configuration file, the small partners do not be nervous, we can modify it ourselves)

Here is a powerful text editor for Linux under the use of vim we'll explain next time

Change the configuration file as follows:

Type again: sudo pip3 install virtualenv (Success)

Then:

Create a python3.5 virtual environment called Course-python3.5-env:

Activating and launching virtual environments source and Deactivate commands

Finally we follow the above, first activate the virtual environment, and then install Scrapy can

Verification: Terminal type: Scrapy--version view the installed version of Scapy, no error can be!

In the future, all of our scrapy crawler projects are running in a virtual environment!

Scrapy crawler framework first talk (Linux environment)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.