Capture webpage content through urllib2 (1) and URL 2 capture webpage content
1. urllib2 sends a request
import urllib2url = 'http://www.baidu.com'req = urllib2.Request(url)response = urllib2.urlopen(req)print response.read()print response.geturl()print response.info()
Urllib2 uses a Request object to map HTTP requests and pass the Request into urlopen () to return the response object.
Request => Response http is based on this Request/Response Mechanism
Response object, which is a file object. You can call methods such as read (), info (), and geturl ().
Response. read () reads the returned content
Response.info () Get Response header
Response. geturl () Get the actual url
Urllib2 uses the same interface to process all URL headers. For example, you can create an ftp request
Req = urllib2.Request ('ftp: // duote.com ')
Ii. POST requests
import urllib2url = "http://www.duote.com/index?php"data = {"softname":"quicktime.exe","size":"18763","md5":"HEN35FLK3WP"}req = urllib2.Request(url,data)response = urllib2.urlopen(req)print response.read()
Request (url, data = None, headers = {}, orgin_req_host = None, univerifiable = False)
Urlopen (url, data = None, timeout = <object>, cafile = None, capath = None, cadefault = False, context = None)
3. GET request
import urllibimport urllib2url = "http://www.2345.com"data = {'name':'Tom','age':'18','studynum':'002195'}urlvalue = urllib.urlencode(data)print urlvaluer_url = url +'?' +urlvaluedata = urllib2.urlopen(r_url)
Generally, in html forms, data needs to be written into a standard format. urllib. urlencode () is called for urlencode encoding. After the encoding is completed, the question mark is used to connect the url to the end of the url.
Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.