International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > Python

Python implementations of HTTP requests (Urlopen, headers processing, cookie handling, setting timeout timeouts, redirection, proxy settings)

Last Update:2018-08-01 Source: Internet

Author: User

Tags urlencode

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

# # Python implements the three-way HTTP request: Urllib2/urllib, Httplib/urllib, and requestsUrllib2/urllib implementation

Urllib2 and Urllib are python two built-in modules, to implement the HTTP function, the implementation is mainly URLLIB2, urllib as a supplement

1 first implementation of a complete request and response model

URLLIB2 provides basic function Urlopen,

import urllib2
response = urllib2.urlopen(‘http://www.cnblogs.com/guguobao‘)
html = response.read()
print html

Improved, two-step: request and response

#!coding:utf-8
Import urllib2
#request
Request = urllib2.Request(‘http://www.cnblogs.com/guguobao‘)
#response
Response = urllib2.urlopen(request)
Html = response.read()
Print html

Use the GET request above, change to POST request below, use Urllib.

#!coding:utf-8
Import urllib
Import urllib2
Url = ‘http://www.cnblogs.com/login‘
Postdata = {‘username‘ : ‘qiye’,
            ‘password‘ : ‘qiye_pass’}
#info needs to be encoded as a format that urllib2 can understand. Here is urllib
Data = urllib.urlencode(postdata)
Req = urllib2.Request(url, data)
Response = urllib2.urlopen(req)
Html = response.read()

- However, the running result is not output because the server denies your access and needs to verify the request header information to determine if the request is from the browser

2 Request Header Headers processing

Add user-agent domain and Referer domain information to the above list
- User-agent: Some servers or proxies check if the value is a browser-issued message
- Content-type: When using the rest interface, the server checks the value and determines what parsing is used by the HTTP body. Otherwise error, refused to respond. Value Details: http://www.runoob.com/http/http-content-type.html
- Referer: Server Check anti-theft chain

#coding:utf-8
#Request header header processing: Set the User-Agent domain and Referer domain information in the request header
Import urllib
Import urllib2
Url = ‘http://www.xxxxxx.com/login‘
User_agent = ‘Mozilla/4.0 (compatible; MSIE 5.5; Windows NT) ‘
Referrer=‘http://www.xxxxxx.com/‘
Postdata = {‘username‘ : ‘qiye’,
            ‘password‘ : ‘qiye_pass’}
# Write user_agent, referer to header information
Headers={‘User-Agent’:user_agent,‘Referer‘:referer}
Data = urllib.urlencode(postdata)
Req = urllib2.Request(url, data,headers)
Response = urllib2.urlopen(req)
Html = response.read()

3 Cookie Processing

Urllib2 the processing of cookies is also automatic, using the Cookiejar function for the management of cookies, if you need to get the value of a cookie entry, you can:

import urllib2,cookielib

cookie = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie))
response = opener.open(‘http://www.zhihu.com‘)
for item in cookie:
    print item.name+‘:‘+item.name

But sometimes, we don't want URLLIB2 to handle it automatically, we want to add cookies ourselves, and we can do it by setting the cookie domain in the request header.

Import urllib2, cookielib

Opener = urllib2.build_opener()
Opener.addheaders.append((‘Cookie‘,‘email=‘+‘helloguguobao@gmail.com‘))#Cookie and email can replace any value, but not
Req = urllib2.Request(‘http://www.zhihu.com‘)
Response = opener.open(req)
Print response.headers
Retdata = response.read()

4 Setting Timeout Timeouts

In python2.6 and the new version, the Urlopen function provides the setting for timeout:

import urllib2
request=urllib2.Request(‘http://www.zhihu.com‘)
response = urllib2.urlopen(request,timeout=2)
html=response.read()
print html

5 Getting HTTP response codes

An HTTP return code can be obtained as long as the GetCode () method of the response object returned by Urlopen is used.

import urllib2
try:
    response = urllib2.urlopen(‘http://www.google.com‘)
    print response
except urllib2.HTTPError as e:
    if hasattr(e, ‘code‘):
        print ‘Error code:‘,e.code

6. Redirection

URLLIB2 automatically redirects the HTTP 3XX return code by default. To detect whether a redirect action occurred, just check the URL of the response and request URL is consistent:

import urllib2
response = urllib2.urlopen(‘http://www.zhihu.cn‘)
isRedirected = response.geturl() == ‘http://www.zhihu.cn‘

If you do not want to automatically redirect, you can customize the Httpredirecthandler class:

import urllib2
class RedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_301(self, req, fp, code, msg, headers):
        pass
    def http_error_302(self, req, fp, code, msg, headers):
        result =urllib2.HTTPRedirectHandler.http_error_301(self,req,fp,code,msg,headers)
        result.status =code
        result.newurl = result.geturl()
        return result

opener = urllib2.build_opener(RedirectHandler)
opener.open(‘http://www.zhihu.cn‘)

7 Proxy settings

In the development of crawlers, agents may be used. URLLIB2 uses the environment variable HTTP_PROXY to set the HTTP Proxy by default. But instead of using this approach, we use Proxyhandler to dynamically set up proxies in the program

import urllib2
proxy = urllib2.ProxyHandler({‘http‘: ‘127.0.0.1:1080‘})# 运行时需要把socketsocks关闭系统代理。并使用1080端口，或者直接退出socketsocks软件
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
response = urllib2.urlopen(‘http://www.zhihu.com/‘)
print response.read()

It is important to note that the Urllib2.install_opener () will be used to set the global opener of the URLLIB2, and then all HTTP accesses will use the proxy, which is convenient, but to use two different proxies in a program You cannot use Install_opener to change global settings, but instead call Urllib2.open () directly.

import urllib2
proxy = urllib2.ProxyHandler({‘http‘: ‘127.0.0.1:1080‘})
opener = urllib2.build_opener(proxy,)
response = opener.open("http://www.google.com/")
print response.read()

The runtime needs to shut down the socketsocks system Agent.

Python implementations of HTTP requests (Urlopen, headers processing, cookie handling, setting timeout timeouts, redirection, proxy settings)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

list of http headers list of http response headers cookie timeout requests cookie jar chrome trace http requests make fewer http requests apache connection timeout setting

Python design mode-UML-Package diagrams (Package Diagram) 09-09

Python abstract class (ABC module) 09-18

The difference between OS and sys two modules in Python 04-05

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python implementations of HTTP requests (Urlopen, headers processing, cookie handling, setting timeout timeouts, redirection, proxy settings)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support