Please indicate the source for reprinting. Thank you ~
Http://blog.csdn.net/shootyou/archive/2011/05/12/6415248.aspx
During a server exception troubleshooting process (I will discuss the process of server exception troubleshooting), we decided to replace httpclient4.x with httpclient3.x or httpconnection.
Why is httpclient4 used? The main reason is that httpconnection does not have the concept of a connection pool, and the number of I/O records will be created for each request. In the case of a large amount of traffic, the I/O of the server may be exhausted.
Httpclient3 also has something in the connection pool and uses multithreadedhttpconnectionmanager. The general process is as follows:
Multithreadedhttpconnectionmanager connectionmanager = new multithreadedhttpconnectionmanager (); httpclient client = new httpclient (connectionmanager);... // In a thread. Getmethod get = new getmethod ("http://jakarta.apache.org/"); try using client.exe cutemethod (get); // print response to stdoutsystem. out. println (get. getresponsebodyasstream ();} finally {// be sure the connection is released back to the connection managerget. releaseconnection ();}
As you can see, the method is similar to the method used by the JDBC connection pool. I think it is uncomfortable to manually call releaseconnection to release the connection. Each httpclient.exe cutemethod must have a method. releaseconnection () matching.
Httpclient4 has made improvements in this regard, using our commonly used inputstream. close () to confirm that the connection is closed (entity is used before version 4.1. consumecontent () to confirm that the content has been consumed to close the connection ). The procedure is as follows:
...HttpClient client = null;InputStream in = null;try{client = HttpConnectionManager.getHttpClient();HttpGet get = new HttpGet();get.setURI(new URI(urlPath));HttpResponse response = client.execute(get);HttpEntity entity =response.getEntity();if( entity != null ){ in = entity.getContent(); ....}catch (Exception e){....}finally{if (in != null){try{in.close ();}catch (IOException e){e.printStackTrace ();}}}
Updated on 2012-03-06:
Some netizens suggested whether calling in. Close () would close the underlying socket, which is like this:
Reply to kangkang203: Thank you for your question. First, the method in. close () It will trigger the release of a connection and the connection will be retained by the Connection Manager. The original Article on the official website said: "Closing the input stream will trigger connection release... the underlying connection gets released back to The Connection Manager ". However, it is not certain whether the underlying socket will be closed. I have read some source code (eofsensorinputstream) and found that the socket will not be closed in most cases, whether or not to close the socket is determined by a watcher. Therefore, the in. Close call will not cause the socket to close. In addition, because HTTP itself treats it as a "short connection", it is not significant to open the socket after a request interaction is completed, after all, it does not have to interact with a persistent connection many times after a connection is established. The more significance of trying the Connection Manager is its connection management.
Now we have finished using the connection pool. Let's talk about the most important parameters used by the connection pool. I implemented a simple httpconnectionmanager using version 4.1. The Code is as follows:
Public class httpconnectionmanager {Private Static httpparams; Private Static clientconnectionmanager connectionmanager;/*** maximum number of connections */public final static int max_total_connections = 800; /*** get the maximum wait time for the connection */public final static int wait_timeout = 60000;/*** maximum number of connections per route */public final static int max_route_connections = 400; /*** connection timeout */public final static int connect_timeout = 10000;/***** read timeout */public final static int read_timeout = 10000; static {httpparams = new basichttpparams (); // you can specify the maximum number of connmanagerparams. setmaxtotalconnections (httpparams, max_total_connections); // you can specify the maximum wait time for a connection. setTimeout (httpparams, wait_timeout); // set the maximum number of connections of each route. connperroutebean connperroute = new connperroutebean (max_route_connections); connmanagerparams. setmaxconnectionsperroute (httpparams, connperroute); // you can specify the connection timeout value for httpconnectionparams. setconnectiontimeout (httpparams, connect_timeout); // you can specify the read timeout value for httpconnectionparams. setsotimeout (httpparams, read_timeout); schemeregistry registry = new schemeregistry (); registry. register (New Scheme ("HTTP", plainsocketfactory. getsocketfactory (), 80); registry. register (New Scheme ("HTTPS", sslsocketfactory. getsocketfactory (), 443); connectionmanager = new threadsafeclientconnmanager (httpparams, registry);} public static httpclient gethttpclient () {return New defaulthttpclient (connectionmanager, httpparams );}}
The maximum number of connections, the maximum wait time for the connection, and the read timeout time are easy to understand. Generally, the connection pool has these configurations, especially the maximum number of connections for each route.
What is a route?
Here, the concept of route can be understood as a line from the running environment machine to the target machine. For example, if we use the httpclient implementation to request resources of www.baidu.com and resources of www.bing.com respectively, then two route will be generated.
Why do we mention the maximum number of route connections? Because the default value of this parameter is 2, if this parameter is not set, the maximum number of concurrent connections to the same target machine is 2 by default! This means that if you are executing a crawling task for a target machine, even if you set the maximum number of connections in the connection pool to 200, only two connections are working, the remaining 198 connections are waiting for other target machines.
I have learned a lot about it. I didn't notice this configuration when I switched to httpclient4.1. In the end, the pressure on the service was not as good as before, so I would like to remind you of this.
Download the httpclient4.x Tutorial:
Http://svn.apache.org/repos/asf/httpcomponents/httpclient/trunk/httpclient-contrib/docs/translated-tutorial/httpclient-tutorial-simplified-chinese.pdf
Version supplement:
The user w2449008821 reminded me that the httpclient4.1 + version of connmanagerparams has been deprecated.
When I was writing this log, the httpclient version was 4.0.3. After version 4.0, connmanagerparams was deprecated. I did not expect such a big change in a minor version upgrade.
The official website provides examples of new connection pool settings:
SchemeRegistry schemeRegistry = new SchemeRegistry();schemeRegistry.register( new Scheme("http", 80, PlainSocketFactory.getSocketFactory()));schemeRegistry.register( new Scheme("https", 443, SSLSocketFactory.getSocketFactory()));ThreadSafeClientConnManager cm = new ThreadSafeClientConnManager(schemeRegistry);// Increase max total connection to 200cm.setMaxTotalConnections(200);// Increase default max connection per route to 20cm.setDefaultMaxPerRoute(20);// Increase max connections for localhost:80 to 50HttpHost localhost = new HttpHost("locahost", 80);cm.setMaxForRoute(new HttpRoute(localhost), 50); HttpClient httpClient = new DefaultHttpClient(cm);
The functions of connmanagerparams are moved to threadsafeclientconnmanager and httpconnectionparams:
static ConnPerRoute |
getMaxConnectionsPerRoute(HttpParams params) Deprecated. UseThreadSafeClientConnManager.getMaxForRoute(org.apache.http.conn.routing.HttpRoute) |
static int |
getMaxTotalConnections(HttpParams params) Deprecated. UseThreadSafeClientConnManager.getMaxTotal() |
static long |
getTimeout(HttpParams params) Deprecated. UseHttpConnectionParams.getConnectionTimeout(HttpParams) |
static void |
setMaxConnectionsPerRoute(HttpParams params, ConnPerRoute connPerRoute) Deprecated. UseThreadSafeClientConnManager.setMaxForRoute(org.apache.http.conn.routing.HttpRoute, int) |
static void |
setMaxTotalConnections(HttpParams params, int maxTotalConnections) Deprecated. UseThreadSafeClientConnManager.setMaxTotal(int) |
static void |
setTimeout(HttpParams params, long timeout) Deprecated. UseHttpConnectionParams.setConnectionTimeout(HttpParams, int) |
Reference: http://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/apache/http/conn/params/ConnManagerParams.html
Http://hc.apache.org/httpcomponents-client-ga/tutorial/html/connmgmt.html#d4e638