Python crawler framework Scrapy installation and configuration, pythonscrapy

Source: Internet
Author: User
Tags openssl api

Python crawler framework Scrapy installation and configuration, pythonscrapy

The previous 10 chapters of crawler notes record some simple Python crawler knowledge,
It is used to solve simple post download problems, and the point-of-performance calculation is naturally difficult.
However, if you want to download a large amount of content in batches, such as all the Q & A tasks in zhihu, it seems that you have more than enough experience.
As a result, Scrapy is playing like this!
Scrapy = Scrach + Python. The Scrach word is captured,

Scrapy's official website address: Click me.

The following is a simple demonstration of the Scrapy installation process.
Specific process reference: http://www.bkjia.com/article/48607.htm
Reminder: You must download the Python version. Otherwise, you will be notified that Python cannot be found during installation. We recommend that you install 32 bits because some versions of the 64-bit software are difficult to find.

1. Install Python (32-bit is recommended)

We recommend that you install Python2.7.x and 3.x, which are not supported yet.
After installation, remember to configure the environment and add the Scripts directory under the python directory and python directory to the Path of the system environment variable.
Enter python in cmd. If version information is displayed, the configuration is complete.

2. Install lxml

Lxml is a library written in Python that can process XML quickly and flexibly. Click here to select the corresponding Python version for installation.

3. Install setuptools

To install the egg file, click here to download the corresponding version of setuptools for python2.7.

4. install zope. interface

You can use setuptools downloaded in step 3 to install the egg file. Now there is an exe version. Click here to download it.

5. Install Twisted

Twisted is an event-driven network engine framework implemented in Python. Click here to download it.

6. Install pyOpenSSL

PyOpenSSL is the OpenSSL API of Python. Click here to download it.

7. Install win32py

Win32api is provided. Click here to download

8. Install Scrapy

It's finally exciting! After so many widgets are installed, the main character is finally displayed.
Enter easy_install scrapy in cmd and press Enter.

9. Check Installation

Open a cmd window and execute the scrapy command at any location. The following page is displayed, indicating that the environment is configured successfully.


How to Use python to write crawler programs

Here is a detailed introduction.

Blog.csdn.net/column/details/why-bug.html

In the scrapy framework, how does one use python to automatically redirect a crawler to a page to capture webpage content?

Crawlers can track the next page by simulating the next page connection and then sending new requests. See:
Item1 = Item () yield item1item2 = Item () yield item2req = Request (url = 'Next page link', callback = self. parse) yield req
Do not use the return statement when using yield.


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.