Performance Tuning Basics
The Load and CPU utilization indicators are the key indicators to measure the CPU busy degree. Load indicates the length of the process queue that is currently running and waiting for running. The higher the load, the more competition for cpu resources, the more processors, the higher load. CPU utilization indicates the CPU usage efficiency within a certain period of time. The ratio of CPU usage time to CPU usage time. When the number of processor cores increases, the higher the load that can be supported, the higher the processing capability. The CPU utilization is mainly related to the implementation of the software. Specifically, it depends on the concurrency of the software implementation (number of threads enabled, serialize. In synchronous and asynchronous (programming model) synchronization mode, the proportion of blocked threads is very high, and the blocked threads are not scheduled by the operating system, and do not occupy the CPU. As a result, CPU usage is reduced within the unit of time, therefore, IO-intensive applications generally have low CPU utilization. However, the synchronous and asynchronous modes do not differ in CPU usage. The traditional one-thread-one request mode can increase the CPU utilization by adding threads, in addition, the performance is better than that of asynchronous mode when the number of threads is not too large, but the advantage of asynchronous mode is that the thread is not blocked due to IO (increasing the proportion of blocked threads ), in the end, we can use fewer threads to achieve high throughput. Too many threads will cause bottlenecks due to inter-thread switching and memory overhead. QPS and rt qps: the number of requests processed by the system per second, which measures the system throughput. RT: the response time of a request, which can also be the average value of a period of time. For a specific system, it is expected that the lower the RT, the better the QPS. For higher QPS, the simplest way is to increase the number of processing threads and increase the concurrency of the program. But the number of threads to a certain extent, due to various reasons: the context overhead increases, the thread stack occupies more memory, the CPU-CACHE hit rate decreases, resulting in QPS cannot continue to rise, in addition, RT will continue to increase, resulting in the failure to meet business needs. Therefore, there will be an optimal number of threads. Optimum number of threads: the number of critical threads that consume the server bottleneck resources (maximum QPS) QPS, RT and optimum number of threads RT = CPU Time + Wait Time optimum number of threads = (Wait Time + CPU Time)/CPU Time) * CPU cores * single-thread model of CPU utilization: QPS = 1000/RT multi-thread model: QPS = optimal number of threads * 1000/RT = (1000/CPU Time) * (CPU cores * CPU utilization) how to obtain the optimum number of threads by adding the number of threads step by step. When the server resources are consumed, the optimum number of stress testing threads is reached, the following features are available: the number of threads continues to increase, the QPS remains unchanged, the RT continues to increase, and the QPS begins to decrease, the resource bottleneck can be CPU or memory, it can also be an I/O or synchronization lock. The formula is cool, but the actual operating system is not that simple. Otherwise, the System Engineers do not have to work hard to find the System Bottleneck for optimization. As the number of threads increases, the context overhead increases, the thread stack occupies more memory, the CPU-CACHE hit rate decreases, may reach the optimum number of threads, RT has not met the business expectations, it makes sense to satisfy the QPS of a certain RT. Thread Pool and QPS as mentioned above, with the increasing number of threads, leading to the increase of context overhead, thread stack occupies more memory, CPU-CACHE hit rate decreased, thus affecting the overall performance of the system. The thread pool (one thread serves multiple requests) can effectively reduce the number of threads and alleviate the above problems. Therefore, no matter whether it is a DB Server (such as mysql) or an application server (such as nginx), there is actually a thread pool. The use of thread pools is of great significance for High-concurrency scenarios, and the number of threads can be maintained at a low level. The thread pool model can improve CPU utilization and reduce the increase of RT by making fewer thread service requests. It is a comparison of the system throughput between the mysql thread pool enabled and the closed thread pool in both pure read and read/write scenarios. CPU-intensive applications consume the CPU, reducing the CPU usage time for a single request (for example, the number of function calls is reduced), and increasing the concurrency (multithreading, making full use of multiple cores ), more requests can be processed within a unit of time to increase QPS. When the number of threads is added to a certain extent, it exceeds the number of CPU cores, resulting in a large number of context switches and increased RT. IO-intensive applications 1. for such applications as databases, the main time consumption is I/O. If the CPU utilization rate does not reach the bottleneck, the number of requests per unit of time must be (1) reduced IO operations (reduced read/write IO times ), if RT can be reduced, the QPS remains unchanged. (2) reduce CPU operations (reduce function calls) to reduce CPU utilization. At the same time, reduce RT, but reduce IO significantly. 2. When RT meets the application requirements, for example, 10 million requests per second and 10 ms for RT, the QPS is 10 w. Meaningful QPS must be related to RT. Theoretically, QPS can continue to increase the number of requests, and RT will continue to increase. When it exceeds 1 s, there will be more requests and the QPS cannot be increased. Because a request cannot be processed within 1 s. Soft Interrupt and hard interrupt are hardware/software processing that receives asynchronous signals from peripheral hardware (relative to CPU and memory) or synchronous signals from software. Hard interrupt refers to the asynchronous signal sent from peripheral hardware to the CPU or memory, that is, the hard interrupt signal, and the Soft Interrupt refers to the signal sent from the software to the operating system kernel, the interruption of the operating system kernel is usually caused by a hard interrupt handler or system call. A key difference between Soft Interrupt and hard interrupt: whether there is an interrupt controller involved, peripherals detect changes, and interrupt the CPU through the Interrupt Controller, is a random behavior. Soft Interrupt directly calls the corresponding interrupt processing program using a CPU command, which is controllable by the program. Both disk IO and nic IO are soft interrupts. For example, when the network card is interrupted, when the network card receives a packet, the network card interrupt processing program copies the data to the buffer zone and sets a flag bit to tell the operating system that there is something to do, then, the ENI is notified that it can continue to receive data. When the operating system returns a result of every interruption, the flag is checked. If yes, the soft interrupt processing function is called back. Nic throughput optimization because the CPU is currently multi-core, you can set the NIC as a multi-queue to improve Nic processing efficiency. When a queue receives a message, it triggers the corresponding interrupt, receives the interrupt core, and processes it. To avoid confusion caused by different cores processing messages in the same queue, each queue is bound to a unique core.