Stress Testing and system optimization tips

Source: Internet
Author: User

yesterday, a friend asked me if there was any optimization information for Mina. If a system on his side is pressed to 500 concurrent jobs, he will not be able to go up. He started to look at the national voice, I didn't think much about it. I just said I didn't have one. Later, when I got a break in the middle, I found that the answer was a bit problematic. I thought I should tell him the tips of the stress test to find the bottleneck to know the problem. I gave a brief explanation at the beginning of last night, today, I will recall some of my previous experiences and post them to help some new people.

Here we will talk about the experience. There are not many specific tools, but the number of blogs I wrote earlier has been mentioned. First, it is determined that the pressure test needs to start to check the problem. The number of concurrent users increases, the TPS does not grow, or even drops, and the RT does not move, or even starts to rise. (The two are sometimes associated with changes, sometimes not associated with changes) and then begin to analyze the problem, the first thing to do: determine whether your own stress testing methods and tools (LR, AB, self-written multi-thread client) are correct, several times I found a problem with the test end (this is a sad reminder). How can I locate such problems? Find a constant benchmark (empty Web Container, mock object solidified RT ). The second thing is to use the resource monitoring commands of the Operating System (for Linux commands, refer to them), CPU utilization, load, and memory usage (cache, swap, and Application Usage), and context switching, io wait, network data volume, packet loss, file handle configuration, and so on. Based on these indicators, we can determine which indicators have experienced significant changes when the number of concurrent users increases, or even become a bottleneck. If you are a Java application, check the jvm gc and thread dump to see if there are a large number of locks, or if a single thread eats a fixed CPU. The third thing is to start to identify what caused these basic resources to become bottlenecks. First, we need to eliminate the dependency system problems, such as the dependency system, such as web containers, centralized cache, and DB. You usually cannot debug these systems, the most important thing is to look at their logs. for warning and error, pay special attention to it. For example, nginx has configuration for the upstream and downstream data packets. By a certain size, it starts to use the disk to relieve the memory pressure, I don't need to pay attention to the pressure, and I/O will be a lot. Next, we start to disassemble the various modules of your application. The mock method ensures that there is no interface dependency consumed, so that we can repeatedly test and locate the problem. When identifying a module with an impact, dazhi must note that this is only a suspect. If you have written a language with a pointer, you will understand that the problem often occurs in non-explosive points. At the same time, it should be noted that when you determine a bottleneck, it may be of lower performance, for example, a and B modules, A is the bottleneck, TPS is 30, and TPS of B is 35. At this time, you have optimized A to TPS is 50, but the pressure on B is obviously increased, at this time, B may begin to decrease due to resource pressure TPS (that is, the rise of concurrent users mentioned above may not only maintain a flat line, but also decrease ). It may be said that many people think that this is common sense, and there is no substantive content. In fact, the first thing is to find the problem, and then use your language familiarity and business design to solve the problem. Finally, several so-called optimized essential oils are most commonly used: 1. Reduce the total rt of key business paths. It is often related to the so-called Code Level saving, it is better to directly optimize the business. (Excluding the basic system) 2. bottleneck resources, such as all the metric resources mentioned in step 2 and the resources dependent on the system (such as the number of threads in the Web container. Two solutions are available for bottleneck resources: A. efficient use. (For example, batch processing of DB, batch obtaining of cache, and batch flushing of disks) B. Use less. Local caching can be performed if the data consistency requirements are not high. C. Release quickly. Those who have written event-driven code should have a deep understanding. After the business processing is refined, the resources originally held will also be used in a fragmented manner. (In fact, we can see this common truth when learning about the evolution of computer CPU). Rapid release means that the same resource can serve more requesters. D. exchange resources. Disk for memory, multi-core CPU for storage. If you are interested, you can see the design details in the top streaming computing framework that has been used for more than two years. 3. changing dependencies, including Web containers, centralized caches, disks, and so on, does not necessarily mean that they are not good, but may require multiple operations (centralized cache) due to unsupported data structures ), multiple systems have better private interaction protocols (between reverse proxy and web containers), and they are revolutionizing (solid state disks ). It should be helpful to write it here first, or you may have a bit of resonance when encountering similar problems. If you are a business system, the order of ten thousand essential oils is your best order for improvement. In that sentence, we need to optimize this item in four steps: 1. Find it. 2. Locate. 3. analysis. 4. Iteration balancing bottleneck.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.