CHAPTER16 Performance Tuning and architecture-basic theory and tools

Source: Internet
Author: User
Tags error status code cpu usage
16.1 Performance Tuning theory 16.1.1 Basic Concepts

Resources (Resource): The functional components of the physical server, some software resources can also be measured, such as thread pool, number of processes, and so on. System operation, requires a variety of resources, for the determination of the list of resources, we can rely on the understanding of the system to determine, you can also draw the system's function block diagram to determine the resources to be measured.

The common physical resources are shown below. CPU, CPU core (core), hardware thread (hardware thread), virtual thread memory network interface storage device storage or network controller internal high speed interconnect

load: How many tasks are being applied to the system, that is, a system input, a request to be processed. For the database, the load includes the commands and queries that the client sends over.

If the load exceeds the design capability, often leads to performance problems, and applications may degrade performance because of the configuration or system architecture of the software application, for example, if an application is single-threaded, it will undoubtedly be subject to single-threaded architecture, because only one core can be used, and subsequent requests must be queued. Cannot take advantage of other cores. But the drop in performance may be simply due to too much load. Too much load will result in queues and high latency, for example, a multithreaded application, you will find that all the CPUs are busy, are working on the task, this time will still be queued, the system load will be very high, this situation is likely to exert too much load.

If you are in the cloud, you may simply add more nodes to handle excessive load, in general production applications, simply adding nodes sometimes does not solve the problem, you need to do tuning and architectural iterations.

The load can be divided into two types:CPU-intensive (Cpu-bound) and I/o-intensive (i/o-bound). CPU-intensive refers to applications that require large amounts of computing, which are limited by CPU resources and are also known as compute-intensive or CPU-bottleneck. I/o-intensive refers to applications that need to perform many I/O operations, such as file servers, databases, and interactive shells, which expect smaller response times. They are limited by the I/O subsystem or network resources.

for CPU-intensive workloads , you can check and count the code for CPU operations, and for I/o-intensive workloads , you can check and count the code that performs the most I/O operations. This allows for more targeted tuning. You can also use the tools that come with your system or the application's own performance detection tool for statistics and analysis.

throughput , it is clear that the database supports a simple query with a much larger throughput rate than a complex query, other application servers are similar, and simple operations execute faster, so for throughput it is also necessary to define what kind of load our system should handle.

Utilization (Utilization): Utilization is used to measure the busyness of the resource that provides the service, based on the percentage of the time that system resources are used to actually perform the work over a period of time. That
Utilization= busy time/Total time
utilization can be based on time, such as CPU utilization: The utilization of one CPU or the CPU utilization of the whole system. For example, for disk utilization, we can use the Iostat command to check the%util.
utilization can also be based on capacity, it can indicate the level of use of our disk, memory or network, such as 90% of the disk space is used, 80% of the memory is used, 80% of the network bandwidth is used.
You can use the example of Highway toll station to make analogy.
Utilization performance is the current number of toll booths are busy service. Utilization 100% means that all toll booths are handling charges and you can't find any free kiosks, so you have to line up. At peak times, it may be 100% utilization, but if you give a full day of utilization data, maybe only 40%, then you can cover up some problems if you focus on the utilization data for the whole day.
often high levels of utilization can lead to resource saturation。 Utilization 100% often means that the system has bottlenecks that can be checked to determine resource saturation and systematicness. The extent to which the resource does not provide service is identified as its saturation, and the latter has a detailed explanation of the resource saturation.
If the granularity of detection is large, then it is likely to cover up the occasional peak of 100%, some resources, such as disk, at 60% utilization time, the performance began to become worse.
Response Time (Response Time): Also called latency, which is the time required to perform the operation. It includes wait time and execution time, optimized execution time is relatively simple, optimization wait time is much more complicated, because of the impact of various other tasks, as well as the competitive use of resources. For a database query, the response time includes publishing query commands from the client to the database processing queries, and transmitting the results to the client all the time. Latency can be measured in a variety of ways, such as access time to the site, including DNS latency, TCP connection latency, and TCP data time. Latency can also be understood at a higher level, including data transfer time and other times, from user clicks linked to Web page content transfer, and rendered on the user's computer screen. Delays are measured by time, and can be easily compared, while others are less easily measured and compared, such as IOPS, which you can turn into latency for comparison.
In general, we measure performance mainly through response time, rather than how much resources, optimization is essentially under a certain load, as much as possible to reduce response time, rather than reduce the use of resources, such as reducing CPU usage. The consumption of resources is only a phenomenon, not the goal of our optimization.
If we can record the time that MySQL is consuming in every link, then we can be targeted for tuning, if we can subdivide the task into some subtasks, then we can optimize MySQL by eliminating subtasks, reducing the number of subtasks, or allowing subtasks to execute more efficiently.
Scalability (Scalability): For scalability, there are two levels of meaning. One is,With the increasing utilization of resources, the relationship between response time and resource utilization, and when resource utilization is elevated, response time remains stable, so we say it's scalable, but if the resource utilization starts to deteriorate, the response time begins to degrade, so we think it's not scalable. 。 The second is that scalability also has a level of meaning,The ability to characterize the system's growing capacity to handle growing loads by continually increasing nodes or resources while still maintaining a reasonable response time
Throughput Rate (Throughput): The rate at which tasks are processed. For network transmissions, throughput generally refers to the number of bytes transmitted per second, and for the database, the number of queries per second (QPS) or the number of transactions per second
Concurrency (Concurrency): Refers to the ability of a system to perform multiple operations in parallel. If the database is able to take full advantage of the CPU's multi-core capabilities, it often means that it has higher concurrent processing capabilities.
Capacity (capacity): capacity refers to the capacity of the system to handle the load. One important part of our day-to-day operations is capacity planning, which ensures that our systems are still able to handle loads and ensure good and stable service as the load increases. Capacity also refers to our resources to use the limit, such as our disk space consumption, after the disk space reached a certain threshold, we may also consider the expansion.
Saturation (saturation): Because the load is too large, more than the service capacity of a resource is called saturation. Saturation can be measured by the length of the wait queue, or by the waiting time in the queue. Tasks that exceed the load capacity are often queued or returned incorrectly, such as CPU saturation, which can be measured by the average run queue (RUNQ-SZ), such as the Avgqu-sz metric used to measure disk saturation using the iostat command output. For example, memory saturation can be measured by some metric of the swap partition.
Resource utilization is high, may appear saturation, figure 16-1 is a resource utilization, load, saturation of the relationship between the diagram, in the resource utilization over 100%, the task can not be processed immediately, need to queue, saturation began to increase linearly with the load. Saturation can cause performance problems because new requests need to be queued and time waits. Saturation does not necessarily occur when utilization is 100%, depending on the degree of parallelism of the resource operation.

     saturation may not be found, the production environment monitoring system, monitoring scripts there is an easy mistake, that is, the sampling granularity is too coarse, such as sampling every few minutes, you might not find the problem, But the problem happens in a short period of 10 seconds. Sudden spikes in utilization can easily lead to resource saturation and performance problems.
    Generally speaking, as the load increases, throughput will increase, the throughput curve will always be linear at the beginning, our system response time is open
A phase will remain stable, but after a certain point, the performance will start to get worse, The response time becomes longer, and as the load continues to increase, our throughput will no longer continue to grow or even fall, and response times may become unacceptable. An exception to this is that the application server returns an error status code, such as a Web server that returns 503 errors, and the throughput curve that returns the error code remains linear because it is difficult to reach the limit because it is essentially not consuming resources.
    's perception of performance is subjective, and whether a performance metric is good or bad may depend on the expectations of developers and end-users. So, if we're going to judge whether we should tune, then we need to quantify these metrics, and when we quantify the metrics and determine the performance goals, the performance tuning is more scientific and easier to understand and communicate.
    below will briefly describe three basic theories: Amdal Law, Universal extension law, and queuing theory .
16.1.2  Amdal Law

    Amdal Law ( Amdahl ' laws ) is a rule of thumb in computer science, Named after IBM's computer architect Gine Amdal. Gine Amdal This important law in a paper published in 1967. The
    Amdal law is primarily used to discover the greatest improvement possible in the overall system when part of a system's components are improved . It is often used in the parallel computing domain to predict the theoretically maximum speedup ratio when multiple processors are applied. In the area of performance tuning, we use this law to help us resolve or mitigate performance bottlenecks issues. The model of the
    Amdal law illustrates the phenomenon of serial resource contention in real production. Figure 16-2 shows the acceleration ratio of the linear extension ( linear scaling ) and extended by the Amdal law ( speedup ) . The curve in Fig. 16-2 is the acceleration ratio curve conforming to the Amdal law. In a system, there will inevitably be some resources that must be serially accessed, which limits our speedup, even if we increase the number of concurrent numbers (horizontal axis), but the results are not ideal and the ability to achieve linear scaling is difficult (the line in Figure 16-2).
    The following introduction, systems, algorithms, programs can be considered as the optimization of the object, the author does not distinguish between, they have a serial part and can be parallel to the part.

    in parallel computing, the acceleration of a program using multiple processors is limited to the execution time of the program's serial part. For example, if a program uses a CPU kernel execution that takes 20 hours, some of the code can only be serial, needs to execute 1 hours, and other 19 hours of code execution can be parallel, then, if you don't consider how many CPUs are available to run the program in parallel, The minimum execution time is not less than 1 hours (part of the serial work), so the acceleration ratio is limited to 20 times times (20/1). The higher the
acceleration ratio, the more obvious the optimization effect is. The
Amdal law can be represented by the following formula:

Where,
S (n): a theoretical acceleration ratio under a fixed load.
B: The proportion of the serial work portion, with a value range of 0~1.
N: Number of parallel threads and number of parallel processing nodes.
The above formulas are specified below.
Acceleration ratio = no improved algorithm time-consuming T (1)/improved algorithm time-consuming T (n) .
    We assume that the total execution time is 1 (assuming 1 units) before the algorithm is improved. The improved algorithm, then, should be time consuming for the serial working part (B) coupled with the time-consuming (1-b)/N of the parallel portion, since the parallel portion can be executed on multiple CPU cores, the actual execution time of the parallel part is (1-b)/n
    According to this formula, if the number of parallel threads (we can understand the number of CPU processors) tends to infinity, then the acceleration ratio will be inversely proportional to the system's serial working portion, and if 50% of the code in the system requires serial execution, then the maximum acceleration ratio of the system is 2. That is, to increase the speed of the system, adding only CPU number of processors does not necessarily play an effective role , you need to improve the system can be parallelization of the module proportion, On this basis, the number of parallel processors can be increased reasonably, so that the maximum acceleration ratio is obtained with minimal input.
    The following is a further explanation of the Amdal law. Amdal This model defines the acceleration ratio of a parallel implementation of an algorithm to a sequential implementation under fixed load. For example, an algorithm with 12% of operations can be executed in parallel, while the remaining 88% cannot be parallel, then the Amdal law states that the maximum acceleration ratio is 1/(1-0.12) = 1.136. As the n in the formula above tends to infinity, then the acceleration ratio s=1/b=1/(1-0.12).
    again for example, for an algorithm, the proportion that can be parallel is p, which can be accelerated by S-times (s), which is understood as the number of CPU cores, that is, the execution time of the new code is the 1/s of the original execution time. If this algorithm has 30% of the code can be accelerated in parallel, that is, p equals 0.3, this part of the code can be accelerated twice times, that s equals 2. Then, using the Amdal law to calculate the acceleration ratio of the whole algorithm is as follows.

The above formula is similar to the previous formula, except that the denominator of the previous formula is expressed in serial proportional B.
For example, for a task, we can break it down into 4 steps, P1, P2, P3, P4, and the total elapsed time is 11%, 18%, 23%, and 48%, respectively. We optimize it, P1 not optimized, P2 can accelerate 5 times times, P3 can accelerate 20 times times, P4 can accelerate 1.6 times times. Then the improved execution time is calculated as follows.

Figure 16-3 illustrates the speedup curve when the proportion of the parallel work part is different, and we can see that the acceleration ratio is more than the proportion of the serial working part, and when 95% of the code can be optimized in parallel, the maximum acceleration is theoretically higher, but the maximum is no more than 20 times times.

The Amdal law is also used to guide the scalable design of CPUs. The development of the CPU has two directions, a faster CPU or more cores. Now it seems that the focus of development is biased towards the CPU's core, with the development of technology, the number of cores in the CPU is increasing, at present our database server Configuration Quad core, six cores are more common, but sometimes we will find that although there are more cores, when we run several programs at the same time, Only a few threads are at work, others do not. In practice, running multiple threads in parallel often does not improve performance significantly, and programs often do not use multi-core effectively. The acceleration ratio in multi-core processor is an important parameter to measure the performance of the parallel program, and whether the proportion of the serial calculation can be reduced effectively and the interaction cost reduced, the key point is to divide the task reasonably and reduce the communication between the cores.
16.1.3 Universal Extension Law

Scalability refers to the ability to meet growing load requirements by continually increasing the number of nodes. However, many people mention scalability, but do not give it a clear definition and quantitative criteria. In fact, system scalability can be quantified, and if you can't quantify scalability, you won't be able to make sure it meets your needs. USL (Universal Scalability Law) provides a way for us to quantify the scalability of the system.
USL, the universal law of expansion, proposed by Dr. Neil Gunther, compared to the Amdal law, USL added a parameter β representing "consistency delay" (coherency delay). Figure 16-4 is its model diagram, the vertical axis represents the capacity, and the horizontal axis represents the concurrency number.

USL can be defined with the following formula.

which
C (N): capacity.
0≤α,β<1.
Alpha: Contention , the degree of contention , due to waiting or queuing for shared resources, will result in an inability to scale linearly.
Beta: coherency , the degree of consistency latency , which is caused by the need for interaction between nodes to keep the data consistent. In order to maintain data consistency, the system performance will deteriorate, that is, as n increases, the system throughput rate will decline. When this value is 0 o'clock, we can think of it as the Amdal law.
n:concurrency , the concurrent number , in the ideal case, is linearly expanded. If you measure the scalability of the software, then n can be the client/user concurrent number, in a fixed hardware configuration (the number of CPUs will not change), and constantly increase the client/user to obtain a performance throughput model, our stress testing software, such as LoadRunner, Sysbench is such. If you are measuring the scalability of your hardware, then n can be the number of CPUs, we continue to increase the number of CPUs, while maintaining the load on each CPU unchanged, that is, if each CPU imposes 100 users of the load, each additional CPU, add 100 users, then a 32 CPU machine, Requires a concurrent load of 3,200 users.
Here's a look at the 4 graphs shown in Figure 16-5 to figure 16-8, which correspond to the changes in capacity (throughput) under different loads.

In Figure 16-5, Α=0, β=0, at this time, with the increase in load, the system throughput is linear rise, that is, we call linear expansion, this is a very idealistic situation, each input will be expected to get the equivalent return, but it is difficult to go on indefinitely, the performance model may be shown in the previous part of linear expansion.
In Figure 16-6,

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.