memcached Performance issues Troubleshooting? A local variable leads to a bloodbath

Source: Internet
Author: User
Tags memcached new set keep alive
A local variable leads to a bloodbath

Today received a temporary task, to troubleshoot a site's bizarre problem, is this, this site traffic is very large, on a module, the page server to send an HTTP request, read another Java Web site to provide data, after the on-line found that once there is concurrency, or more access, HTTP requests fail, and even the server cannot open any pages, but the server can be ping or ping other URLs (I don't see the real situation, I just hear about it).

Launching an HTTP request for a server with a high concurrent web site I always feel horrible, and I've never done that in my previous projects (preferring to ask the client for Ajax), and I suspect that it was a thread pool problem at first. Because we know that The asp.net request is handled by a worker thread in the thread pool. If the HTTP request is synchronized in this worker thread, then it is like putting a memory-level processing speed at the network level (because of the need to synchronize the network), I wonder if the site itself will be a large number of visits (such as the maximum allowable worker thread pool 800 , the current concurrency needs to occupy 200), then after this speed down (for example, from 100 millisecond processing time to 1 seconds), the thread pool is full (200x10=1000>800). As a result, I recommend developing asynchronous pages and asynchronous HttpWebRequest, the simplest way is to use the WebClient Asynchronous Pattern Downloadstringaync (). After the developer modified the code to my suggestion, the problem was not resolved. I am also very confused, I can not think of what will be the reason (I have tested the online below, use this way even if a large number of requests always occupy a worker thread, and IOCP is more).

After you get the opportunity to debug online servers, modify the page code implanted with the timer, the timer output thread pool of available worker threads and available IOCP, the result request came after only 30 worker threads (our site 30 concurrent or so), IOCP not occupy (because later code rollback, An asynchronous HTTP request is not used). Later I want to look at the TCP connection, the input netstat–s I silly, see more than 4,000 TCP connections, at that time the first reflection is too much traffic. Before writing a program, remember that it seems like you want to make registry changes to enable more than 60,000 TCP ports for Windows Server (the default port range looks like 1000 to 5000, which is exactly the same as the 4,000 + connections seen earlier). Remember that TCP has to wait 4 minutes by default after the port is closed to continue using, check for MaxUserPort and tcptimedwaitdelay two parameters, modify these 2 values to the largest 65534 and the smallest 30 (seconds) according to MSDN. Well, if every request to visit a Web site is a new connection, each request is to establish an HTTP request, then a request will occupy more than 2 ports, the maximum can be established 60,000 ports, 30 seconds all released, that is 30 seconds to handle 30,000 requests, then our processing capacity is 1000 request /sec, but our site generally does not have more than 50 concurrent per server, far enough. Modified the registry reboot machine found that the modification is effective, but the establishment of TCP connections in the pressure to quickly reach more than 60,000, the server can not access the problem of the extranet, at this time to determine the root cause problem is too many TCP connections, Can no longer establish a connection (of course, can not establish an HTTP connection), but the open page can indeed open, the new page can not open, and suddenly thought of the HTTP keep alive.

With this direction to modify the code, set the HttpWebRequest for the Keep alive (in order to verify that the connection:keep alive is added to the HTTP header and a lot of trouble), the target Web server also enabled keep alive, Mess set up a bunch of servicepointmanager properties, after a few hours after a variety of attempts or not, always found that the server is easy to reach 60,000 connections (also 1 minutes less), CPU hurricane, and then like broken nets, has been entangled in why there is no keep Alive, why not reuse TCP connections. Later, a separate website was created, and found that it did not occupy such a large number of ports (there are other services or modules open a lot of TCP port.) )。

It suddenly occurred to me why not see what connection was established. Enter Netstat–an–p TCP > C:\a.txt, then open a.txt a look at the instant dumbfounded, 99% of the connection is to memcached server. In the first reaction, the code used a lot of memcached. After checking the code found that although a request may indeed have been 10 or so of memcache access (can not say less, but not too large to this extent), but we use the client has a connection pool, can not be concurrent large to create so many connections, And our connection pool's largest TCP connection is 500. So continue to look at the code, after seeing the Memcachedclient initialization code, I instantly petrified, incredibly is every time declare a local variable, and then initialize Memcachedclient class, Instead of using a static variable to save an instance of Memcachedclient.

Before researching the Memcache client connection pool bug problem, I learned that every time a new Memcachedclient object is instantiated, it creates a new set of connection pools that have a minimum connection, that is, the minimum connections are created when the pool is initialized. To achieve a better initial performance, this value we configured is 10 (each request to create a new 10 TCP connections.) )。 Write a cycle of testing to run the 100 cycles, the establishment of the TCP connection to 1000 (and each time the initial connection pool is more than 300 milliseconds, the CPU is always 100%, can be seen to create so many connections are consuming performance), and guess exactly the same, Also think of that time to find the page to refresh the connection added more than 10 is the reason AH. After changing the local variable to a global static variable, the site still works well in 100 concurrent cases, much larger than expected (50 concurrent).

From the thread pool to suspect that the port is not enough, and then suspect to keep alive problem, the final positioning of the final reason (and the new HttpWebRequest module does not matter), a lot of difficulty. Then the solution is simple, to troubleshoot other memcache in the project, Then modify the Memcache client (we are using enyim.caching), modify the Memcachedclient construction method private, and provide the singleton entry.

For performance optimization My experience is that most of the performance problems come from one to two root causes, and can be very effective if found and resolved. I hope this article is helpful to everyone.
Author: Lovecindywang

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.