Introduction to Python Crawler advanced usage of Urllib Library four

Source: Internet
Author: User

»python Crawler Four advanced usage of the Urllib library

1. Set headers

Some sites do not agree to the program directly in the way of access, if the identification of the problem, then the site will not respond, so in order to fully simulate the work of the browser, we need to set some headers properties.

First of all, open our browser, debugging browser F12, I use Chrome, open the network monitoring, as shown below, for example, after the login, we will find that the interface has changed after landing, a new interface, essentially this page contains a lot of content, These content is not a one-time loading completed, in essence, the implementation of a good number of requests, is generally the first request for HTML files, and then load js,css and so on, after many requests, the skeleton and muscle of the Web page, the effect of the entire Web page out.

Split these requests, we only see a first request, you can see, there is a request URL, and headers, the following is response, the picture is not full, the small partners can experiment with their own hands. So this header contains a lot of information, file encoding, compression, request agent, and so on.

Where the agent is the identity of the request, if there is no write request identity, then the server does not necessarily respond, so you can set up the agent in headers, such as the following example, this example just explains how to set the headers, the small partners to look at the format is good.

Introduction to Python Crawler advanced usage of Urllib Library four

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.