Concurrent connectivity issues with HttpClient

Source: Internet
Author: User
Tags rfc solr

Yesterday's search system again, a few libraries at the same time to rebuild the index becomes dead slow. After a morning of analysis of the recurrence, determine the problem occurs httpclient use (I use 3.1 of this widely used legacy version). When the search system rebuilds the index, it is concurrent with multiple threads (by default, 8) fetching data from the PHP client (of course, from another point of view, the search system is the client, the PHP side is the server), the retrieval is placed in a queue by a separate one or more threads to update the index. In the test environment replication found that for a request, PHP-side printing time is 1-2 seconds, but the search side print in 4-6 seconds. This time-consuming difference also has two possibilities, one is that the PHP end takes too long to return to the search end, and the other is that the search side waits a long time before actually sending the PHP data. Because of the jetty7 of the previous, I initially suspected that the transmission of data problems. Because the code part of the request data I simply use the httpclient, so only from the httpclient to start analysis. I think of the PHP side and the search end of the request start and end time to compare, but before doing so I put the number of concurrent requests on the search side to 1, to see how the single-threaded case effect, the results are surprised to find the PHP side and the search side time-consuming close. Therefore, it is possible to determine that there may be a problem with httpclient concurrent connection processing. Needless to say, opened the HttpClient API and configuration-related interfaces, the results found in the Httpconnectionmanagerparams class in the following two functions:

public void setdefaultmaxconnectionsperhost (int maxhostconnections); public void setmaxtotalconnections (int maxtotalconnections);

HttpClient uses a connection pool to handle the request connection, which actually has two connection pools, one global connectionpool and one per host (Per-host) Hostconnectionpool. The parameter Maxhostconnections Hostconnectionpool indicates the number of connections per host that can remain connected, maxtotalconnections is the maximum number of connections that can be maintained in ConnectionPool. The configuration class for each host is hostconfiguration,httpclient with an int executemethod (final hostconfiguration hostconfiguration, final HttpMethod method) can specify which hostconfiguration to use, but in most cases the specified hostconfiguration is not displayed, so httpclient uses the default hostconfiguration=null. This means that all requests can be considered to be from the same host. If you do not set these two parameters, HttpClient will naturally use the default configuration, that is, in Multithreadedhttpconnectionmanager:

public static final int default_max_host_connections = 2; Per RFC 2616 sec 8.1.4? public static final int default_max_total_connections = 20;

The

Default maxhostconnections size is only 2, which means that when I request data concurrently with 8 threads, there are actually 6 threads waiting to be dispatched, which explains the above phenomenon. Looking at the comments above, I found the last paragraph from the http://www.faqs.org/rfcs/rfc2616.html of RFC 2616 sec 8.1.4 Practical considerations:

Clients that use Persistent connections should limit the number of simultaneous connections that They maintain to a given server. A Single-user Client should not maintain more than 2 connections with any server or proxy. A proxy should use up to 2*n connections to another server or proxy, where N is the number of the simultaneously active users. These guidelines is intended to improve HTTP response times and avoid congestion.

Look at this narrative, also shows that others httpclient set Maxhostconnections to 2 is substantiated. However, this setting is obviously suitable for browsers such as clients, but I believe that most of the use of httpclient do not want to have this default limit. and its default of only 20 of maxtotalconnections is too stingy. I later browsed the code of the client server class Commonshttpsolrserver in SOLR and found the following paragraph, and SOLR knew more about HttpClient than I did:

_httpclient = (client = = null)? New HttpClient (New Multithreadedhttpconnectionmanager ()): client; if (client = = NULL) {//Set some better defaults if we created a new Connection manager and client//increase the Defaul T connections This.setdefaultmaxconnectionsperhost (32); 2 this.setmaxtotalconnections (128); 20}

For HttpClient, it is particularly pointed out that its multithreadedhttpconnectionmanager, the name seems to be multi-threaded concurrent requests, in fact, it is not, but it does use a multi-threading, That's when you find that the connection is not enough to wait for a thread waiting signal, the meaning of this name should be multithreaded Httpconnectionmanager thread-safe.
With HttpClient there are two points of experience, one is to create Multithreadedhttpconnectionmanager instances preferably global, otherwise there will be multiple connection pools, and httpclient is thread-safe and can be multiple instances. The other is to call Method.releaseconnection () at the end of the processing request, which is the finnaly, or the connection pool may explode if the connection is recycled.

Add: After writing it down on the bed, I thought of a few questions, here to add:
1, the system originally rebuilt index vaguely remember the speed is still possible, why now become so obvious? There are two reasons, one is the original system to take the data is a single-threaded (I later found that the single-fetch data speed to keep up with the speed of updating the index so changed to multi-threaded), and the other is that the reconstruction did not open at the same time a number of libraries. So, even with the same code, the environment changes and the effect may change. When this change happens quietly, the programmer does not capture the first instinct to feel the problem is weird.
2, for a long time can not be connected to the situation, httpclient whether there is warn log report out? Because I used the httpclient Getresponsebodyasstream method, and it would play warn log, so I turned off the warn level of httpclient. So, I checked the httpclient code, but did not see the relevant warning log, this httpclient can be improved. But HttpClient is now 4 times, and I use 3.1, and 3.x has been stopped to update, so the use of HttpClient can consider 4 version, although now can see the code is almost all 3.x series.
3. Does the HttpClient document specifically mention the configuration of the connection number? I looked over it and did mention it in a page about threading. However, when I used it, I obviously did not read the document completely. Perhaps, HttpClient gives a clear best practice to be able to arouse the user's attention, otherwise the misuse situation still can occur occasionally. Don't believe, Google's.

Concurrent connectivity issues with HttpClient

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.