Python implementations of HTTP requests (Urlopen, headers processing, cookie handling, setting timeout timeouts, redirection, proxy settings)

Source: Internet
Author: User
Tags urlencode


# # Python implements the three-way HTTP request: Urllib2/urllib, Httplib/urllib, and requestsUrllib2/urllib implementation

Urllib2 and Urllib are python two built-in modules, to implement the HTTP function, the implementation is mainly URLLIB2, urllib as a supplement

1 first implementation of a complete request and response model
    • URLLIB2 provides basic function Urlopen,
import urllib2
response = urllib2.urlopen(‘http://www.cnblogs.com/guguobao‘)
html = response.read()
print html
    • Improved, two-step: request and response
#!coding:utf-8
Import urllib2
#request
Request = urllib2.Request(‘http://www.cnblogs.com/guguobao‘)
#response
Response = urllib2.urlopen(request)
Html = response.read()
Print html
    • Use the GET request above, change to POST request below, use Urllib.
#!coding:utf-8
Import urllib
Import urllib2
Url = ‘http://www.cnblogs.com/login‘
Postdata = {‘username‘ : ‘qiye’,
            ‘password‘ : ‘qiye_pass’}
#info needs to be encoded as a format that urllib2 can understand. Here is urllib
Data = urllib.urlencode(postdata)
Req = urllib2.Request(url, data)
Response = urllib2.urlopen(req)
Html = response.read()
      • However, the running result is not output because the server denies your access and needs to verify the request header information to determine if the request is from the browser
2 Request Header Headers processing
    • Add user-agent domain and Referer domain information to the above list

      • User-agent: Some servers or proxies check if the value is a browser-issued message
      • Content-type: When using the rest interface, the server checks the value and determines what parsing is used by the HTTP body. Otherwise error, refused to respond. Value Details: http://www.runoob.com/http/http-content-type.html
      • Referer: Server Check anti-theft chain
#coding:utf-8
#Request header header processing: Set the User-Agent domain and Referer domain information in the request header
Import urllib
Import urllib2
Url = ‘http://www.xxxxxx.com/login‘
User_agent = ‘Mozilla/4.0 (compatible; MSIE 5.5; Windows NT) ‘
Referrer=‘http://www.xxxxxx.com/‘
Postdata = {‘username‘ : ‘qiye’,
            ‘password‘ : ‘qiye_pass’}
# Write user_agent, referer to header information
Headers={‘User-Agent’:user_agent,‘Referer‘:referer}
Data = urllib.urlencode(postdata)
Req = urllib2.Request(url, data,headers)
Response = urllib2.urlopen(req)
Html = response.read()
3 Cookie Processing
    • Urllib2 the processing of cookies is also automatic, using the Cookiejar function for the management of cookies, if you need to get the value of a cookie entry, you can:
import urllib2,cookielib

cookie = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie))
response = opener.open(‘http://www.zhihu.com‘)
for item in cookie:
    print item.name+‘:‘+item.name
    • But sometimes, we don't want URLLIB2 to handle it automatically, we want to add cookies ourselves, and we can do it by setting the cookie domain in the request header.
Import urllib2, cookielib

Opener = urllib2.build_opener()
Opener.addheaders.append((‘Cookie‘,‘email=‘+‘helloguguobao@gmail.com‘))#Cookie and email can replace any value, but not
Req = urllib2.Request(‘http://www.zhihu.com‘)
Response = opener.open(req)
Print response.headers
Retdata = response.read()
    • Run
4 Setting Timeout Timeouts
    • In python2.6 and the new version, the Urlopen function provides the setting for timeout:
import urllib2
request=urllib2.Request(‘http://www.zhihu.com‘)
response = urllib2.urlopen(request,timeout=2)
html=response.read()
print html
5 Getting HTTP response codes
    • An HTTP return code can be obtained as long as the GetCode () method of the response object returned by Urlopen is used.
import urllib2
try:
    response = urllib2.urlopen(‘http://www.google.com‘)
    print response
except urllib2.HTTPError as e:
    if hasattr(e, ‘code‘):
        print ‘Error code:‘,e.code
6. Redirection
    • URLLIB2 automatically redirects the HTTP 3XX return code by default. To detect whether a redirect action occurred, just check the URL of the response and request URL is consistent:
import urllib2
response = urllib2.urlopen(‘http://www.zhihu.cn‘)
isRedirected = response.geturl() == ‘http://www.zhihu.cn‘
    • If you do not want to automatically redirect, you can customize the Httpredirecthandler class:
import urllib2
class RedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_301(self, req, fp, code, msg, headers):
        pass
    def http_error_302(self, req, fp, code, msg, headers):
        result =urllib2.HTTPRedirectHandler.http_error_301(self,req,fp,code,msg,headers)
        result.status =code
        result.newurl = result.geturl()
        return result

opener = urllib2.build_opener(RedirectHandler)
opener.open(‘http://www.zhihu.cn‘)
7 Proxy settings
    • In the development of crawlers, agents may be used. URLLIB2 uses the environment variable HTTP_PROXY to set the HTTP Proxy by default. But instead of using this approach, we use Proxyhandler to dynamically set up proxies in the program
import urllib2
proxy = urllib2.ProxyHandler({‘http‘: ‘127.0.0.1:1080‘})# 运行时需要把socketsocks关闭系统代理。并使用1080端口,或者直接退出socketsocks软件
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
response = urllib2.urlopen(‘http://www.zhihu.com/‘)
print response.read()

It is important to note that the Urllib2.install_opener () will be used to set the global opener of the URLLIB2, and then all HTTP accesses will use the proxy, which is convenient, but to use two different proxies in a program You cannot use Install_opener to change global settings, but instead call Urllib2.open () directly.

import urllib2
proxy = urllib2.ProxyHandler({‘http‘: ‘127.0.0.1:1080‘})
opener = urllib2.build_opener(proxy,)
response = opener.open("http://www.google.com/")
print response.read()

The runtime needs to shut down the socketsocks system Agent.


Python implementations of HTTP requests (Urlopen, headers processing, cookie handling, setting timeout timeouts, redirection, proxy settings)


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.