Python web crawler Tips Small Summary, static, Dynamic Web page crawl data easily

Source: Internet
Author: User
Tags python web crawler

A lot of people learn to use Python, most of them are all kinds of crawler script: have written the script to catch proxy native verification, have written the automatic mail-receiving script, as well as write a simple verification code recognition script, then we will summarize the Python crawler grasp some of the practical skills.

Static Web page

For the static web crawler does not have to say that everyone knows, because crawling static web page is very simple, as long as the HTML crawl directly with requests and then use regular expression matching.

Dynamic Web pages

Relative to the static Web page is simple, but the Dynamic Web page will be relatively complex, and now the speed of development of the Internet, Dynamic Web page is the most, static pages are relatively small, but he has a good count, I have a wall ladder.

HTTP requests for dynamic Web pages fall into two forms:

Get method and Post method

    • Get method: For example, we enter a network address on the browser, which is the request to initiate a GET method. This network address is the URL.
    • Post method: Not common in reptiles, so not detailed introduction

If you know the form of a website request, be skilled in using the F12 Developer tool and check the network inside.

Take a look at the case

Of course, not all Web pages are sent by the request to get the data, there are non-sending data Dynamic Web page.

For such a site, we generally use selenium to do the simulation browser behavior, you can directly get the results of the browser rendering. But the speed of selenium is relatively slow.

The specific cases are as follows:

So whether the page is static or Dynamic Web page is a method of crawling, of course, many sites are required to login and identify verification code, anti-crawling, and so on, no matter what the site measures are there is a way to deal with, the key is you will not.

Python web crawler Tips Small Summary, static, Dynamic Web page crawl data easily

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.