The basic use of the Python crawler's urllib library

Source: Internet
Author: User
Tags urlencode

The basic use of the Python crawler's urllib library
Import urllib2response = Urllib2.urlopen ("http://www.baidu.com") print Response.read ()

In fact, the above Urlopen parameters can be passed to a request requests, it is actually a request class instance, constructs the need to pass in the Url,data and so on content. Like the two lines of code above, we can rewrite this.

#-*-Coding:utf-8-*-"" "Created on Fri Apr 11:23:04 2017@author:zeze" "" Import urllib2request = Urllib2. Request ("http://www.baidu.com") response = Urllib2.urlopen (request) print response.read ()

  

The result is exactly the same, except that there is a request object in the middle, it is recommended that you write this, because there is a lot of content to be added when building a request, and the server responds to the request by building a demand, which is logically clear.

Post and get data transfer

The above program demonstrates the most basic web crawl, however, most Web sites are now dynamic pages that require you to dynamically pass parameters to it, which responds accordingly. So, when we visit, we need to pass the data to it. What is the most common situation? By the way, it's time to sign up.

Send the data user name and password to a URL, and then you get the response after the server processing, what should I do? Let me make it up to the little friends!

Data transmission is divided into post and get two kinds of ways, what is the difference between the two ways?

The most important difference is that the Get method is accessed directly as a link, which contains all the parameters and, of course, is an unsafe option if the password is included, but you can visually see what you have submitted. Post does not display all the parameters on the URL, but it is not very convenient if you want to see what is being submitted directly, and you can choose as appropriate.

Post mode:

What do we mean by the data parameter? By the way, it's used here, and the data we're transmitting is this parameter, which shows the Post method.

Import Urllibimport urllib2values = {"username": "[email protected]", "Password": "XXXX"}data = Urllib.urlencode (values) url = "HTTPS://PASSPORT.CSDN.NET/ACCOUNT/LOGIN?FROM=HTTP://MY.CSDN.NET/MY/MYCSDN" request = Urllib2. Request (url,data) response = Urllib2.urlopen (request) print response.read ()

We introduced the Urllib library, now we simulate the landing csdn, of course, the above code may not go in, because CSDN also has a serial number of the field, not set the whole, more complex in here do not write up, here is just a description of the principle of login. The general login site is usually this kind of notation.

We need to define a dictionary, named values, parameters I set the username and password, the following use Urllib's UrlEncode method to encode the dictionary, named data, build request when the two parameters, url and data, Run the program and return the content of the page rendered after post.

Note that there is another way to define the dictionary above, and the following notation is equivalent

Import Urllibimport urllib2values = {}values[' username '] = "[email protected]" values[' password '] = "XXXX" data = Urllib.ur Lencode (values) url = "HTTP://PASSPORT.CSDN.NET/ACCOUNT/LOGIN?FROM=HTTP://MY.CSDN.NET/MY/MYCSDN" request = Urllib2. Request (url,data) response = Urllib2.urlopen (request) print response.read ()

The above method can realize the post mode transmission

Get mode:

As for Get mode we can directly write the parameters to the URL, directly build a URL with parameters to come out.

Import urllibimport urllib2values={}values[' username '] = "[email protected]" values[' password ']= "XXXX" data = Urllib.urlencode (values) url = "Http://passport.csdn.net/account/login" Geturl = URL + "?" +datarequest = Urllib2. Request (Geturl) response = Urllib2.urlopen (request) print response.read ()

Http://cuiqingcai.com/1052.html

The basic use of the Python crawler's urllib library

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.