Summary of the methods for simulating HttpRequest in Python, pythonhttprequest
Python can be said to be a powerful tool for network crawling. This article mainly introduces some methods and techniques for simulating http requests using python.
Python processes two request class libraries: urllib and urllib2. These two libraries are not two different versions of a class library. urllib is mainly used to process url-related content. When sending a request, the request object can only be a url. Urllib2 can use a request object to implement a request, such as forging a header, setting a proxy, http get, http post, and other methods.
Read this article to learn about the basic knowledge of http requests, such:
- What is httpwebrequest and httpwebresponse?
- What is get and post?
- What is cookie?
This article describes the methods used to simulate requests:
- Set proxy
- Counterfeit Header or Header information
- Enable cookie
- Processing url parameters
Use urllib2.urlopen to send messages directly
Import urllib2url = 'HTTP: // www.baidu.com/'response = urllib2.urlopen (url) # urlopen accepts the input parameter either string or requestresponse_text = response. read ()
Use urllib. build_opener to send requests directly
import urllib2url = 'http://www.baidu.com/'opener = urllib2.build_opener()response = opener.open(url)response_text = response.read()
Access the site through proxy
proxy_handler = urllib2.ProxyHandler({"http" : 'http://localhost:8888'})opener = urllib2.build_opener(proxy_handler)response = opener.open(url)response_text = response.read()
Request body (http post) attached to the request)
opener = urllib2.build_opener()response = opener.open(url,'request body')response_text = response.read()
If the body is in the key-value format, you can refer to the url Processing Section below for processing.
Enable Cookie
cookie = cookielib.CookieJar()opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie))response = opener.open(url)response_text = response.read()
Use urllib2.Request to add custom Header information
request = urllib2.Request(url)request.add_data('1234567')request.add_header('User-Agent', 'fake-client')response = urllib2.urlopen(request)
Process parameter information in a url
Whether using the get or post method, parameters are often used. You can use the following class library to process parameters.
Convert the parameter set to string
para = {'111':'222','aaa':'bbb'}encodeurl = urllib.urlencode(para)
Output AAAS = bbb & 111 = 222
Convert url parameters to dictionary
url = 'https://www.baidu.com/s?wd=python%20url%20querystring&pn=10&oq=python%20url%20querystring&tn=baiduhome_pg&ie=utf-8&usm=1&rsv_idx=2&rsv_pq=d09af93600035cb8&rsv_t=d151qRmNNdybGINHcKbyO360E2%2Fg%2FUs2t0MiKqRQXwhHZuNF3IlKyyStzYuofVZczQA3'splitresult_instance = urlparse.urlsplit(url)
Output object:
SplitResult (scheme = 'https', netloc = 'www .baidu.com ', path ='/s ', query = 'wd = python % 20url % 20 querystring & pn = 10 & oq = python % 20url % 20 querystring & tn = baiduhome_pg & ie = UTF-8 & usm = 1 & rsv_idx = 2 & rsv_pq = d09af93600035cb8 & rsv_t = d151qRmNNdybGINHcKbyO360E2 % 2Fg % 2FUs2t0MiKqRQXwhHZuNF3IlKyyStzYuofVZczQA3 ', fragment = '')
If you want to convert to a set
result_dic=urlparse.parse_qs(splitresult.query)
In this way, put the data information in the url to implement http get and put it in the body to implement http post.
This article is also hosted on a http://simmon.club/blog/Python-HttpRequest/