Python-experience urllib3-HTTP connection pool Application

Source: Internet
Author: User

You can download the relevant libraries and materials through the http://code.google.com/p/urllib3.

First, list the usage methods:

# Coding = utf8import urllib3import datetimeimport timeimport urllib # create a connection pool http_pool = urllib3.httpconnectionpool ('ent .qq.com ') to connect to a specific host # obtain the start time strstart = time. strftime ('% x % Z') for I in range (0,100, 1): print I # combine URL string url = 'HTTP: // ent.qq.com/a/20151116/10906d.htm' % I print URL # Start to synchronously obtain the content r = http_pool.urlopen ('get', URL, redirect = false) print R. status, R. headers, Len (R. data) # print time 'start time: ', strstartprint 'end time:', time. strftime ('% x % Z ')

Relatively simple: first establish the connection pool http_pool, and then continuously obtain the URL resources of the same host ('ent .qq.com.
Capture packets through Wireshark:

All the SRC ports corresponding to http://ent.qq.com/a/20151116/???#=}.htm=are 13136, which indicates that the port is reused.
The keep-alive feature should be used according to the urllib3 document, and all the connection fields of repond are keep-alive.

How can this connection pool be implemented?

Def urlopen (self, method, URL, body = none, headers = none, retries = 3, redirect = true, assert_same_host = true): # remove many condition judgment statements try: # obtain connection conn = self. _ get_conn () # combined request self. num_requests + = 1 Conn. request (method, URL, body = body, headers = headers) # set timeout Conn. sock. setTimeout (self. timeout) httplib_response = Conn. getresponse ()#...... # parse httprespond response = httpresponse. from_httplib (httplib_response) # Put the current connection into the queue for reuse of self. _ put_conn (conn) handle T # error handling... # redirect processing. Here is the recursive if (redirect and response. status in [301,302,303,307] And 'location' in response. headers): # redirect, retry log.info ("redirecting % s-> % s" % (URL, response. headers. get ('location') return self. urlopen (method, response. headers. get ('location'), body, headers, retries-1, redirect, assert_same_host) # Return response

As you can see from the simplified code above, first get the connection, then construct the request, get the request, and then get the respond.

Note that each connection is established by calling _ get_conn

After the connection is established, the _ put_conn method is called and put into the connection pool. The related code is as follows:

Def _ new_conn (Self): # create a connection return httpconnection (host = self. host, Port = self. port) def _ get_conn (self, timeout = none): # Try to get the connection conn = none try: conn = self from the pool. pool. get (Block = self. block, timeout = timeout) # determine if the connection has been established? If conn And Conn. sock and select ([Conn. sock], [], [], 0.0) [0]: # Either data is buffered (bad), or the connection is dropped. log. warning ("connection pool detected dropped" "connection, resetting: % s" % self. host) Conn. close () failed t empty, E: pass # Oh well, we'll create a new connection then # If the queue is empty or the connection in the queue is disconnected, create a connection on the same port return conn or self. _ new_conn () def _ put_conn (self, Conn): # Put the current connection into the queue. Of course, the default maximum element size of this pair of columns is 1. If it exceeds this size, then it is discarded. Try: Self. pool. put (Conn, block = false) fail t full, E: # This shoshould never happen if self. block = true log. warning ("httpconnectionpool is full, discarding connection: % s" % self. host)

Through the above pool and the general urllib library for testing performance, continuous access to different webpages of the same domain name, the speed is not significantly improved, probably because the server is relatively close to the local, the main optimization of the pool is to reduce the number of TCP handshakes and the number of slow start times, which is not well reflected.

I do not know any good methods for Performance Testing suggestions?

It is also mentioned whether to provide a connection pool in urllib3 to automatically create a pool for each host when accessing different websites, that is, httpocean

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.