The principle of performance analysis and tuning

Last Update:2015-09-23 Source: Internet

Author: User

Tags websphere application server

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recently it has been a tangle of performance analysis and tuning how to start with hardware first, or from code or database first. From the operating system (CPU scheduling, memory management, process scheduling, disk I/O), Network, Protocol (HTTP, TCP/IP), or from application code, database tuning, middleware configuration and so on.

Single middleware also sub-web middleware (Apache, IIS), Application middleware (tomcat,WebLogic , WebSphere), and so on, although all middleware, each to carry out to the depth of the study is not overnight work. But tuning for each of these requirements is not just "know" or "will use" so simple. At least to achieve "how to use better."

Often seen in the performance Test book said that performance testing is not only a performance test engineer a person's business. Requires the cooperation of the DBA, developer, and OPS staff. However, in many cases performance testing is done independently by the performance tester, step back even by the assistance of other people, understand the system architecture of the various modules for their own improvement is also a great help, the same into the more people can be respected.

Again before the performance tuning, we need to mention the purpose of the test, or what is the original intention of our performance testing?

　　Ability Verification : Verify the ability of a system in certain conditions.

　　Capacity Planning : How to make the system meet the performance capabilities we require.

　　Application Diagnostics : Memory leaks, for example, are difficult to find through functional testing, but are easily discoverable through performance testing.

　　Performance Tuning : To meet user needs, further analysis of the system to identify bottlenecks, optimize bottlenecks, improve the overall performance of the system.

　　Bottlenecks in the general system

　　Performance Test tuning needs to identify bottlenecks first, so what are the common bottlenecks in the system:

　　performance bottlenecks on the hardware :

Generally refers to the CPU, memory, disk I/O problems, divided into server hardware bottlenecks, network bottlenecks (LAN can not be considered), server operating system bottlenecks (parameter configuration), middleware bottlenecks (parameter configuration, database, Web server, etc.), application bottlenecks (SQL statement, database design , business logic, algorithms, etc.).

　　performance bottlenecks on application software :

Generally refers to application servers, Web servers and other applications, including database systems.

For example: The configuration of the JDBC connection pool on the middleware WebLogic platform is unreasonable, causing the bottleneck.

　　performance bottlenecks on the application :

This is generally referred to as the developer's newly developed application.

For example, the program architecture is unreasonable, the program itself has problems (serial processing, the processing thread of the request is not enough), causing the system to be poor performance in a large number of user positions caused by the bottleneck.

　　performance bottlenecks on the operating system :

Generally refers to the Windows, UNIX, Linux and other operating systems.

For example, in performance testing, when there is insufficient physical memory, virtual memory settings are not reasonable, virtual memory exchange efficiency will be greatly reduced, resulting in a significant increase in response time behavior, when the operating system is considered to be a performance bottleneck.

　　performance bottlenecks on network devices :

Generally refers to firewalls, dynamic load balancers, switches and other devices.

For example, when a dynamic load balancer is set up to dynamically distribute load, the dynamic load balancer sends subsequent transaction requests to other lightly loaded application servers when it discovers that the hardware resources on an application server have reached its limit. In the test, it is found that the dynamic load balancer does not play a corresponding role, then the network bottleneck can be considered.

　　The reason for the performance test and its location is very complex, here is a brief introduction of the common bottleneck types and characteristics, and performance testing needs to do is based on a variety of factors to consider, and then assist the developer \dba\ operators together to locate performance bottlenecks.

General Performance Tuning Steps

General performance issues tuning steps:

　　Step One : Identify the problem

Application code: In general, the performance problems of many programs are written, so for the module that finds the bottleneck, you should first check the code.

　　database configuration: Often causes the entire system to run slowly, some large databases, such as Oracle, require the DBA to make the correct parameter adjustments before it can be put into production.

Operating system configuration: Unreasonable can cause system bottleneck.

Hardware settings: Hard disk speed, memory size and so on are easy to cause bottlenecks, so these are the focus of analysis.

Network: Network overload causes network conflicts and network latency.

　　Step two : Identify the problem

When the problem is identified, do we need to know whether the problem affects response time throughput or other issues? Are most users or a few users experiencing problems? If there are a few users, what is the use of these users and other users? Is the result of system resource monitoring normal? Is CPU usage reaching the limit? What is the I/O situation? Is the problem concentrated in a certain class of modules? Is there a problem with the client or the server? Is the system hardware configured enough? Does the actual load exceed the system's load capacity? is the system not optimized?

Through these analysis and some system-related problems, the system bottleneck can be more in-depth understanding, and then analyze the real cause.

　　Step three : Identify adjustment goals and solutions

High system throughput, shorter response times and better support for concurrency.

　　Step four : Test the solution

Benchmark the system after tuning through the solution. (Benchmarking refers to the design of scientific test methods, test tools and test systems, to achieve a class of test objects of a certain performance indicators for quantitative and comparable testing)

　　Step five : analyze the tuning results

Does the system tuning meet or exceed the intended target? System is the overall performance of the improvement, or a part of the system performance to solve other problems. Whether the tuning can end.

Finally, if the desired goal is met, the tuning work is almost over.

Here is a technique, such as the interviewer asked a performance question hypothesis, I do not know where the performance problem is, you can follow this idea to answer ^_^

• Find bottlenecks in the following order, from easy to difficult.

Server hardware Bottleneck---network bottleneck (can not be considered for LAN)---server operating system bottleneck (parameter configuration)--- middleware bottleneck (parameter configuration, database , Web server, etc.)---application bottleneck (SQL statement, Database design , business logic, algorithms, etc.)

Note: The above process is not required in every analysis, to determine the depth of the analysis according to the testing purposes and requirements. For some low-demand, we analyze the application system in the future under the load pressure (number of concurrent users, the amount of data), the hardware bottleneck of the system is enough.

• Segmented exclusion is effective

Highlights of performance test tuning should be noted:

Point 1: In the design and development process of the application system, performance should always be put into the scope of consideration.
Point 2: Identifying clear performance goals is key.
Point 3: You must ensure that the program is running correctly after tuning.
Point 4: The performance of the system depends more on the good design, and the tuning technique is just an aid.
Point 5: The tuning process is an iterative process, and each tuning result is fed back to the subsequent code development.
Point 6: Performance tuning cannot be code that sacrifices code readability and maintainability.

This article only describes some of the things you should focus on for performance tuning and the general points for performance tuning. Does not specifically say how to each part of the system tuning, how to elaborate or not one or two books can be said clear, the requirements of knowledge is very high, is my current ability can not touch.

　　Here's a summary:

" performance test know how much" series is basically finished, although the time pulled longer, but I did not give it to the eunuch. While the content is talking about the theory of performance testing, I think these things are essential to your performance testing work. Of course, I explained the use of two performance test tools in the "JMeter Basics" and "LoadRunner techniques".

I would be very happy if these articles brought a glimmer of help to students who wanted to learn and learn about performance. I am not a master, just with you love the test technology for beginners, but more like summing up, but also often for the future confused, but I know that as long as the broken to learn, the road is in front. I'll tidy up the article on performance tuning later.

Dynamic WEB applications can store large amounts of information, allowing users to access the information immediately through a familiar interface. However, as your application becomes more popular, you may find that the response to the request is not as fast as it used to be. Developers should understand how Web applications handle Web requests and know what they can and cannot do in Web application development, which helps reduce future hassles.

A static Web request (more than the request shown in 1) is easy to understand. The client connects to the server (typically via TCP port 80) and uses the HTTP protocol to make a simple request.

Figure 1. Client requests a static file over HTTP

The server parses the request and maps it to a file on the file system. The server then sends a response header to the client that describes the payload (such as a Web page or image) and finally sends the file to the client.

There are several bottlenecks that can occur in the above scenario. If the request changes so much that it is not possible to use the operating system's disk cache efficiently, the server's disk is busy and, to some extent, slows down the process. If the network channel that provides the data to the client is saturated, all clients are affected. However, in addition to these conditions, the process of "receiving requests, sending files" is quite efficient.

The performance of a static server can be roughly understood by making assumptions. Assuming that the service time of a request is 10ms (mainly by the head seek time limit), then approximately 100 requests per second will make the disk nearly saturated (10msec/request/1 second = Requests/second). If you are sending a 10K document, you will generate approximately 8mbit/sec of WEB traffic (Requests/second * kbytes/request * 8bits/byte). If you can get files from the in-memory cache, you can reduce the average service time, thus increasing the number of connections that the server can handle per second. If you have real data on disk service time or average request latency, you can put them in the above calculation to calculate a more accurate performance estimate.

Since the server's processing capacity is the reciprocal of the average request service time, the server's processing capacity (the number of connections processed per second) is halved if the service time is doubled. Keep this in mind and take a look at the dynamic application scenario below.

The process of a dynamic application relies on the specifics of the application, but in general it is similar to Figure 2.

Figure 2. Clients request dynamic pages over HTTP

As with the client in the previous example, the client in Figure 2 first makes a request. There is really no difference between a static request and a dynamic request (sometimes. Extensions such as PHP or. CGI may mean dynamic requests, but they can be misleading). How the request is processed is determined by the WEB server.

In Figure 2, the request is sent to an application server, such as a Solaris system that runs a Java™ application. The application server performs some processing and then queries the database for more information. Once this information is available, the application server generates an HTML page that is forwarded by the WEB server to the client. Therefore, the service time for this request is the sum of several parts. If the database access costs 7MS and the application server spends 5ms on the 13ms,web server, the Web page service time is 25ms. According to the reciprocal rules described earlier, the capacity of each component is 142, 77, and 200 requests per second. Therefore, the bottleneck is the application server, which allows the system to handle only 77 connections per second, after which the WEB server is forced to wait and the connection begins to queue.

However, it is important to note that because the system can only allocate 77 connections per second, and a connection requires a processing time of 25ms, not every application user's request can be processed within 25MS. Each component can handle only one connection at a time, so under peak load, the request has to wait for CPU time. In the example above, the average request service time will eventually exceed 1.1 seconds, given the queuing time and the processing time of 25ms. For more information on resolving these queuing issues, see resources.

The following conclusions can be drawn from these examples:

The more steps are made between the user making the request and getting the final page, the slower the overall process and the lower the system capacity.
This effect becomes more pronounced as the page request rate increases.
Architectural decisions made at the beginning of a project also affect the ability of the site to handle the load.

The remainder of this article will discuss these issues in depth.

N-tier architecture for dynamic sites

The architecture of applications, including WEB applications, is often described in terms of layers. Static sites can be thought of as only a single layer of--web servers. If you use a WEB server to run a scripting language (such as PHP) to connect to a database, this can be considered a two-tier scenario. The example in the previous section has three tiers , the front-end WEB server, the application server, and the database.

A software may also consist of multiple layers, depending on the object you are talking to. For example, a PHP script might use a template engine to separate business logic from presentation, which can be considered a separate two-tier. Java applications may perform presentation tasks through a Java servlet, and the servlet executes the business logic with Enterprise Java Bean (EJB) communication, and the EJB obtains more information by connecting to the database. So, in a different perspective, the three-tier architecture may be another, especially when it comes to the various toolsets.

Common architectures

Although the architecture of the application varies, there are some common architectural trends. In general, an application requires four functional layers:

Client Layer
Presentation Layer
Business Logic Layer
Data layer

In a Web application, the client layer is handled by a Web browser. The browser displays HTML and executes Javascript (as well as Java applets, ActiveX, or Flash applets) to display information and gather user information to the user. The presentation layer is the interface from the server to the client, which controls the format of the output so that the output can be displayed on the client computer. The business logic layer enforces business rules, such as calculations and workflows, to drive applications. Finally, the data access layer is a persisted data store, such as a database or a file store.

Most applications require all of this four-tier functionality, although they may not need to implement these layers in a clear and complete manner.

Another popular architecture is Model-view-controller, which is a pattern for separating application components. In the MVC pattern, the model encapsulates the business logic layer and encapsulates the data layer with the framework. The view is responsible for processing the data representation sent to the client. The role of the controller is to control the application flow.

Capacity expansion of the layer

Expanding the capacity of a WEB application means allowing it to handle more traffic flows. One aspect of capacity expansion is how to deploy hardware on demand. Another aspect is how the application responds to the new hardware environment. Conceptually, in the case of performance problems, it is often the first thought of using a more powerful server, but the application itself is likely to cause other bottlenecks. Dividing the application into a series of layers helps to shrink the scope of the problem, simplifying capacity expansion.

The application bottleneck is now not considered. There are usually two ways to extend the hardware of an application: horizontal and vertical scaling. Horizontal scaling means adding more servers to the tier. In the previous example, the application server bottleneck would limit the request rate to 77 requests per second, which could be resolved by adding a second application server and sharing the load between two servers. This increases the theoretical capacity to 154 requests per second, and the bottleneck position goes to the database.

Vertical scaling, on the other hand, means using a more powerful computer. You can use a more powerful computer to run two instances of the application server, or to process requests more quickly.

At first glance, you might completely rule out vertical scaling, because it's often cheaper to buy multiple small computers than to keep buying more advanced servers. However, in many cases, vertical scaling is a better approach. If you have a ibm®power® server that supports hardware partitioning through logical partitioning (LPAR), you can add idle capacity to the application server tier.

Application requirements may also prompt you to choose vertical scaling. It is easy to share a user's session state with a shared memory segment on a single server. If you use two servers, you need to share the state in other ways, such as a database. Database access is slower than memory access, so the processing speed of the two servers is less than twice times that of a single server.

A database is another occasion where vertical scaling is often a good use. Having a dataset spanning different servers requires a lot of work at the application layer, such as connecting columns across two databases and ensuring that the data is consistent. It's much easier to use a more powerful database server, and you don't need to rebuild your application to support scattered data.

To model a WEB application as a queue

As you can see from the previous discussion of application architectures, WEB requests pass through multiple stages, each of which takes a certain amount of execution time. The request is queued through each step, after completing one step, and then queued to the next step. Each step is much like the way people queue up in a store to checkout.

You can model a WEB application as a series of steps (called a "queue"). Each component of the application is a queue. A typical WebSphere application modeled as a series of queues is shown in 3.

Figure 3. WebSphere® applications modeled as queued networks

Figure 3 shows that the request waits for the Web server to process them, waits for the Web container, and so on. If the request rate into a queue exceeds the rate at which the queue processes requests, the requests are aggregated. When request aggregation occurs, the service time is unpredictable and the user perceives a browser session delay. The queue in Figure 3 represents the worst case scenario because the WEB server can handle some requests on its own, that is, it does not require access to the database.

Queues are common in UNIX® environments. When an application makes disk requests faster than the rate at which the disk returns data, the operating system queues the disk requests and may adjust the order of requests to reduce seek time. The other queue is the run queue, which contains an ordered list of processes waiting to run. Applications will wait for their turn to use certain limited resources (such as CPUs).

Therefore, queue tuning is an art of balance. The queue is too small to deny the user when there is still excess capacity. A queue is too large to attempt to service too many users, resulting in poor performance.

Another factor that makes the situation more complicated is that these queueing positions are not cost-free. Reserving a queued location can cause memory overhead, which, for the application server, is competing for memory with the thread that is processing the request. Therefore, in general, queuing on an application server is not a good approach. The recommended approach is to queue up before the application server, such as on a WEB server. This means that the Web server maintains a connection to the Web client and makes a request when the application server is idle. The application server only needs to handle requests that it can dispatch in a timely manner.

The Web application layout method and the various queue tuning methods are recommended in IBM's documentation. Note, however, that IBM recommends that you avoid queuing in WebSphere. This means that the request rate that is sent to the WebSphere application server should be controlled within the range that can be processed immediately. The Web server (or the proxy server in front of the Web server) should limit excessive connections and let them wait for processing. This ensures that the load-heavy application server queue can spend time servicing a limited number of requests, rather than trying to service all requests at the time.

Tips for Developers

As a developer, you should improve the scalability of your application by following some general principles. These principles can be applied to most WEB applications.

Measurement facilities

The application should provide metrics to the collection system in some way (even if the collection system is just a log file). These measures include the frequency of accessing a function in the application or the time it takes to process a request, and so on. This does not make the application run faster, but it helps to understand why the application is slowing down and which parts of the code take the longest time. Knowing when to invoke certain functions helps to link the phenomena observed on the system (such as CPU busy or high disk activity) to activities in the application (such as uploading images).

Being able to understand what is happening on the site is the key to expanding the site's capacity. The parts of the code that you think are not optimized may not cause problems. Real bottlenecks can be found only through appropriate measurements.

Session

The Web is inherently stateless. Each request made by the user is independent of the previous request. However, applications are often stateful. Users must log in to the application to prove their identity and may maintain the status of the shopping cart during their visit to the site, and may also fill in personal information for later use. Tracing a session is a costly operation, especially when multiple servers are involved.

Web applications running on a single server can put session information in memory, and any Web application instance running on the server can access shared memory. The user is often assigned a flag that identifies the session in memory. Consider what happens when a second application server is involved. If the user's first request is sent to a server and the second request is sent to another server, there will be two separate sessions, which are not the same.

A common solution to this problem is to store the session in a database rather than in memory. The problem with this approach is that for each request, you need to increase the database read operation and possibly the database write operation. This database is required for each WEB application server.

One solution is to use the session only where the session is needed. Instead of loading sessions for each request, the application loads the session only when a session is required. This reduces the number of requests to the back-end database.

Another method is to encrypt the session data and send it back to the client so that there is no need to store the session locally. The amount of data that can be stored in the user's cookie is limited, but RFC 2109 specifies that the client should be able to store at least 20 cookies per domain name, and each cookie can hold at least 4K bytes of data.

If you find that sessions with database storage are performance bottlenecks and cannot eliminate them, you should consider dispersing them into separate databases or even multiple databases. For example, you can store even-numbered session IDs in one database and odd-numbered session IDs in another database.

Cache

Some parts of the application modify the data more frequently than other parts. News sites may only change the top-level category list once per month. Therefore, it is wasteful for each request to get the latest categorized list by querying the database. Similarly, pages that contain newsletters may be modified only one or two times throughout their lifecycle, so there is no need to regenerate it for each request.

Caching means storing the results of requests with high processing costs for later use. You can cache a categorized list or an entire page.

When considering caching, ask yourself a question: "Does this information have to be up to date?" "If this is not the case, you might consider using caching. It may be important to be able to change the press releases in a timely fashion when the news first appears, but it is sufficient to check the changes every minute and provide the pages through the cache.

A complementary approach is to invalidate cached data items when the underlying data changes. If you modify a newsletter, you can delete the cached version when you save it. For the next request, a new data item is generated because there is no cached version.

When using the cache, you must be aware of what happens when the cache entry expires or is deleted. If there are many requests that cache entries are requested, the cache entries are regenerated for many users when the cache entry expires. To solve this problem, you can regenerate the cache only for the first request, while the other user uses the outdated version until the new cache entry is available.

memcached is a popular distributed memory caching system that is used by many applications deployed in a UNIX environment. The server runs an instance of the Memcache daemon, which allocates a piece of RAM that can be accessed through a simple network protocol. Applications that want to store or get data in memcache first hash the keys, which tells them which server in the Memcache pool should be used. The data is then inspected or stored by connecting to the server, which is much faster than disk or database access.

When looking for data that should be cached, you should also consider whether you really need to provide that information directly. Need to display the user's shopping cart on each page? How about showing the total amount only? Or just show a simple link to "view the contents of your cart".

Edge-side includes (ESI) is a markup language that can be used to divide Web pages into separate, cacheable entities. The application is responsible for generating HTML documents that contain ESI tags, and is responsible for building the components. The proxy cache in front of the WEB application re-assembles the final document according to the various parts and is responsible for caching some components and making requests for other components. Listing 1 shows an example of a ESI document.

Listing 1. ESI Example

Although this example is very simple, listing 1 shows how to stitch together two documents that have their own cache rules.

Asynchronous processing

There is also a question with "Does this information have to be up to date?" "Related:" Must I update this information when I have finished processing the request? "In many cases, you can get user-submitted data and defer processing for a few seconds without having to wait for the user to load the page while processing the information." This is called asynchronous processing. A common approach is to have the application send data to a message queue, such as IBM WebSphere MQ, waiting for the data to be processed when the resource is available. This allows a page to be returned to the user immediately, even though the results of the data processing are unknown.

Consider an e-commerce application where the user submits an order in this program. It may be important to return the credit card test results immediately, but it is not necessary to have the order system immediately confirm that all the contents of the order are valid. Orders can be placed in a queue for processing, which can occur within seconds. If an error occurs, the user can be notified by e-mail if the user is still on the site and can even insert an error notification into his session. Another example is reporting. Instead of waiting for the user to wait for the report to be generated, you can return to the "Reports page in a few minutes" message and generate the report asynchronously on another server.

Conclusion

Applications are often written in a hierarchical manner. The representation logic is separated from the business logic and the business logic is separated from the persistent storage. This approach can improve the maintainability of your code, but it also leads to some overhead. As you scale your application's capacity, you should understand the flow of data in a tiered environment and find where bottlenecks occur.

Technologies such as caching and asynchronous processing can reduce application workloads by reusing previous results or transferring work to another computer. Provide metrics in your application to help keep you informed about hotspots.

The application server environment works much like a queueing network, so be sure to carefully manage the size of the queue to ensure that one layer does not exert excessive pressure on the other. IBM recommends that you queue as much as possible before the application server, such as on an external WEB server or proxy server.

Simply by putting in more hardware, you can rarely scale your application's capacity effectively. It is often necessary to apply these technologies in a comprehensive way to make new hardware work.

The principle of performance analysis and tuning

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More