Ultra-high Performance pipeline HTTP request (practice, principle, implementation)

Last Update:2018-01-01 Source: Internet

Author: User

Tags ack keep alive

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The high-performance here refers to how fast the network card request to send can be how fast, basically the server in a client under the pressure will appear obvious delay. This article is to introduce the principle of pipe pipeline, the following mainly through its high-performance test practice, the analysis behind the data flow and principle. Finally comes with a simple implementation PracticeFirst look at the comparison test method for a single client HTTP request to the server, general our approach 1: single-process or thread polling requests (this is naturally very low, the reason is that, No test) 2: Multiple threads preparing data waiting signals in advance (high client performance requirements) 3: Prepare a set of threads in advance for simultaneous polling operations 4: Using the System/platform's own asynchronous send mechanism (actually the way the platform thread pool is sent and received using different threads from the thread pools) for test scenario 1, And the low performance of scenario 2 testing is not comparable, the following test will not show the results below show the next 2 test methods and the current way to say pipeline style

First of all, the pipeline (pipe) test plan (the principle is described in the following), the test using 100 pipeline (pipeline), in fact, less even a pipeline can achieve approximate performance, However, most server nginx restricts the number of requests that can be sent by a single tube (most of them are 100 and partly 200 or higher), and each pipeline sends a 100 request.
Then there is a thread group that prepares 100 threads (100 threads are not many that do not have a noticeable effect on the system itself), and each thread polls for 100 requests.
Asynchronously, 10000 commits the sending thread, and is received by the thread pool control.

Test environment: General household pc,i5 4-core, 12G, 100Mb Telecom Bandwidth test data: GET http://www.baidu.com http/1.1content-type:application/ X-www-form-urlencodedhost:www.baidu.comconnection:keep-alive This is the most commonly used Baidu, if the test interface performance is poor, most requests will be queued in the application server, It is difficult to visualize the advantages of pipe (in fact, it is not the ability to use the pipe, the server first blocked) all the pipe in the following test is the use of Pipehttpruner (http://www.cnblogs.com/lulianqi/p/ 8167843.html for the test tool, the use of the method and the introduction of the first direct supervision of the performance of the road: (All Windows comes with Task Manager and resource manager)

First explain the meaning, the following are also the same meaning the first pair of task manager for the solid line to receive data, dashed to send data, sampling 0.5s, each square of the scale of 1.5s (because the Task Manager drawing strategy rate rise too fast too high no way to display, But you can still see the timeline. The second pair is the resource Manager, add 3 sampler, red is CPU occupancy, Blue is the network receive rate, green is the network send rate. During the test, the original request was about 130 bytes, plus the TCP,IP header, 10,000 probably also only 1.5Mb (Baotou will not be too much because there will be multiple requests in the pipeline request into a package, but most of the server can not have such a rapid response speed will have a large number of retransmissions, in fact, the traffic may be much larger than the theoretical value) once the return packet is about 60MB (because there will be a partial connection midway interrupt So There must be 10,000 full replies every Test) can see the use of pipe form performance is very prominent, the overall test only use 5s or so to send itself pressure is small, you can see 0.5 seconds to reach the peak, in fact, this time the basic 10,000 request has been sent out, The latter traffic comes mainly from the server-side cache Waiting (TCP window full) is too late to process the photo as a retransmission, which will be discussed later. Take a look at the reception of response, basically also only used 0.5s that reached the receiving peak, using about 5s is completed the full receive, because the test CPU occupancy rise is not obvious, and for the response reception is basically read from the TCP buffer directly after the existence of content, There is no disk operation involved (so basically the test for pipe does not play its full performance, the bottleneck is mainly on the network bandwidth). Then look at the way the thread group (100 threads per 100 times) The following is the way to receive asynchronously

The obvious difference is that for a thread group it takes about 25 seconds, and asynchronous receive takes more than 1 minutes (the asynchronous receive mode is the recommended send mode for the platform, the performance is superior in normal application, and the high pressure is inferior to the custom thread group, mainly because it uses the default thread pool. The default thread pool is not likely to open 100 threads in a short time to receive data, so a large number of replies to the thread constructor will have a large number of switches, by setting the default thread pool number can improve performance in the test. More importantly, whichever of these 2 is in the test, the CPU is almost full (that is, in order to complete the test computer is already full load work, it is difficult to improve) behind the Jd,toabao,youku, including the company's own server has been tested, The test results are similar, as long as the server does not have a problem basically there are more than 10 times times the gap (if the client bandwidth is sufficient this gap will be greater). Next, we will be a simple test of the interface form of HTTP to use the interface of the NetEase e-commerce here (e-commerce interface generally can withstand the pressure ratio, where the previous confirmed that the test does not have a substantial impact on its normal use) http://you.163.com/xhr/ globalinfo/querytop.json?__timestamp=1514784144074 (here is an interface to get a list of items) test data settings are as follows

The request volume or 10,000 received response data is approximately 326Mb 30s complete. Basically is the limit of the network, at this time the CPU is basically no then the pressure (100 pipelines, each 100 requests) here in fact the request is a timestamp, because the test is using the same timestamp, so the actual effect on the application server is not small, You can set a different timestamp for each request (because you want to demonstrate the use of the online public service, use the test service when testing). Note that if you choose a test object with lower performance, most of the traffic will be queued on the server side, resulting in little throughput, which is actually slow server-side processing. Does not have a small relationship with clients. In general, a normal PC can give the server a noticeable delay when testing with a pipe. principleThe normal HTTP general implementation is after the connection is complete (TCP handshake) occurs request flow to the server, and then into the wait, received response after the end of the calculation (such as)

Of course http1.1 is supported keep alive, complete once sent and received can not close the connection using the same link to the next request (such as) the performance of the increase or more obvious, especially in the early years of limited server performance, network resources scarce, RTT large (network delay). However, for today's situation, these are not the most important problem can be clearly seen above the pattern, is sure to wait until response arrives, the client can initiate the next request, if the application server needs time processing, all subsequent requests need to wait, Even if you do not need any processing to reply directly to the client, the request, reply on the network time also must be complete, and due to the characteristics of the TCP transmission itself, the rate is gradually rising, so intermittent send receive very affect TCP quickly reach the maximum line performance. Pipe (pipeline type) is to avoid the above problem, He does not need to wait for a reply to be sent directly (in fact, http1.1 protocol has never been said must wait for response to arrive after the client to send the next request, just to facilitate the application layer business implementation, the general HTTP library is implemented, and now see how much HTTP server is the default support pipe ) so that the send and receive can be separated (e.g.)

In fact, the occurrence may be faster than this graph, the request 1,2,3,4 is likely to be put in a TCP packet is sent out all at once (this mode also brings some trouble to the application, which will be mentioned later)

For pipes with a relatively real situation such as, multiple requests are packaged together to be sent, and sometimes even after all request is sent, the server begins to reply to the first response.

The normal keepalive pattern, for example, is that a line represents a request, not just one at a time, but a response that must wait for a reply before it can be sent next. Let's look at what the pipe pattern looks like in the actual test.

You can see that after the handshake is completed (in fact, the handshake time is not long only 4ms), then directly started the request sent, you can see a TCP packet in the following directly contains the complete 12 requests. In the absence of any reply, all requests to be sent can be made in advance (the server has closed the Nagle algorithm).

Because the sending speed is too fast until the more than half nearly 70 request is issued, the first TCP confirms that the packet with the packet sequence number 353 (only the ACK packet is not response) is emitted (327), and the server quickly discovers the next packet problem and raises the TCP DUP ACK (https://ask.wireshark.org/questions/29216/why-are-duplicate-tcp-acks-being-seen-in-wireshark-capture The cause can be referred to here) "TCP DUP ack occurs when the receiver discovers the packet notch (packet out of order), this situation will send a duplicate ACK, not only for fast retransmission, will trigger faster than fast retransmission of the recovery mechanism (fast retransmission) If a duplicate ACK is found, but no gaps are found in the message, it means that you are capturing the source of the data (not the receiver), which is quite normal if the data is lost when it is sent to the receiving party. You should see a retransmission package "in fact, the server did not find the next packet 3 times (4 times ·) TCP dup ack is for 353, so the subsequent client quickly retransmission TCP DUP ack The missing package (362 as seen below) can also be seen after the specified loss of speed, resulting in a partial loss sequence (out of order). However, it is important to note that these errors are common in TCP transmission, and TCP has its own set of efficient mechanisms to recover these errors, even if the existence of these errors does not affect the actual performance of the pipe. If the server exception is not immediately recovered, it may cause an exponential backoff, such as

The problem of high-speed transceiver, not only dropped packets, out of sequence, retransmission, both the client and the server will have a receive window depletion situation, if the receiver side window depletion will appear TCP Zerowindow/window full. So both the client and the server need to read the TCP buffer data quickly

By checking the TCP stream, we can determine that the 100 request of the partial pipeline in this test is all issued, and response is gradually sent to the server. Now look at response's reply.

Because the response itself is very large, and the client's MSS only 1460 (see above 1506 is not more than the MSS mean, the actual packet is only 1424, plus 48 bytes of TCP header, 20 bytes of IP header, 14 bytes of Ethernet header is 1506, The normal TCP header is 20 bytes because the TCP packet is torn down, so there are more than 28 bytes of options in the header, so a response is split into multiple packages. It is not difficult to see through the message that the response in the network is about 1ms less than the time (about 730 microseconds), because it is to filter out the port (designated pipe) traffic, in fact, in this less than 1ms of time the other pipeline may also be receiving data. pipe can be more than normal request mode performance so much, mainly has the following 1: Pipeline send, each request do not wait response reply can send the next (the focus is not to use the same line, and not about to wait for reply) 2: Multiple requests package send , a package can contain multiple REQUEST3 if the network condition is appropriate: As long as the server allows only minimal TCP links to be created (because non-LAN TCP lines generally follow a slow boot, the network normally takes some time to achieve the highest efficiency) Now we can say that the pipe is a drawback the actual pipe has long been supported by http1.1, and most nginx servers also support and open this feature. Compared to the normal HTTP KeepAlive Transport pipe HTTP solves the Hol blocking (Head-of-line blocking), and it is no longer to follow the one-off mode, so that the application layer can not directly correspond to each request and reply one by one. This is not a very friendly way to request a section of a post that needs to commit and differentiate the returned results. The solution is also very simple, the application service for the request on the response with a unique tag can be differentiated, or directly use HTTP2.0 (https://tools.ietf.org/pdf/ Rfc7540.pdf) (which is also a significant improvement of 2.0, http2.0 is also differentiated by adding an ID for each frame that identifies the current stream in a similar way) below is a simple comparison between pipe and regular HTTP

Pipe-Line HTTP	Normal HTTP 1.1
Use the same TCP line	Use different links (support keepalive to keep links)
You can send the next request directly without waiting for a reply	The same link must receive a reply before the next request can be initiated
Once/a packet can send multiple requests at the same time	Only one request can be sent at a time

ImplementThe pipe is as follows. NET simple implementation class library, and the application of the Class Library Deom test tool implementation process is relatively simple can be directly referred to the GitHub project,Mypipehttphelper is a tool class for implementing pipe (with more detailed comments in the code), Pipehttpruner for testing tools written using the tool class
https://github.com/lulianqi/PipeHttp/(Project address) Https://github.com/lulianqi/PipeHttp/tree/master/MyPipeHttpHelper (Class library address) Https://github.com/lulianqi/PipeHttp/tree/master/PipeHttpRuner (test Deom address)

Ultra-high Performance pipeline HTTP request (practice, principle, implementation)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More