Tuning your WEB application for performance

Source: Internet
Author: User
Tags message queue php script websphere application server

Dynamic WEB applications can store large amounts of information, allowing users to access the information immediately through a familiar interface. However, as your application becomes more popular, you may find that the response to the request is not as fast as it used to be. Developers should understand how Web applications handle Web requests and know what they can and cannot do in Web application development, which helps reduce future hassles.

A static Web request (more than the request shown in 1) is easy to understand. The client connects to the server (typically via TCP port 80) and uses the HTTP protocol to make a simple request.

Figure 1. Client requests a static file over HTTP

The server parses the request and maps it to a file on the file system. The server then sends a response header to the client that describes the payload (such as a Web page or image) and finally sends the file to the client.

There are several bottlenecks that can occur in the above scenario. If the request changes so much that it is not possible to use the operating system's disk cache efficiently, the server's disk is busy and, to some extent, slows down the process. If the network channel that provides the data to the client is saturated, all clients are affected. However, in addition to these conditions, the process of "receiving requests, sending files" is quite efficient.

The performance of a static server can be roughly understood by making assumptions. Assuming that the service time of a request is 10ms (mainly by the head seek time limit), then approximately 100 requests per second will make the disk nearly saturated (10msec/request/1 second = Requests/second). If you are sending a 10K document, you will generate approximately 8mbit/sec of WEB traffic (Requests/second * kbytes/request * 8bits/byte). If you can get files from the in-memory cache, you can reduce the average service time, thus increasing the number of connections that the server can handle per second. If you have real data on disk service time or average request latency, you can put them in the above calculation to calculate a more accurate performance estimate.

Since the server's processing capacity is the reciprocal of the average request service time, the server's processing capacity (the number of connections processed per second) is halved if the service time is doubled. Keep this in mind and take a look at the dynamic application scenario below.

The process of a dynamic application relies on the specifics of the application, but in general it is similar to Figure 2.

Figure 2. Clients request dynamic pages over HTTP

As with the client in the previous example, the client in Figure 2 first makes a request. There is really no difference between a static request and a dynamic request (sometimes. Extensions such as PHP or. CGI may mean dynamic requests, but they can be misleading). How the request is processed is determined by the WEB server.

In Figure 2, the request is sent to an application server, such as running a Java? The Solaris system for the application. The application server performs some processing and then queries the database for more information. Once this information is available, the application server generates an HTML page that is forwarded by the WEB server to the client. Therefore, the service time for this request is the sum of several parts. If the database access costs 7MS and the application server spends 5ms on the 13ms,web server, the Web page service time is 25ms. According to the reciprocal rules described earlier, the capacity of each component is 142, 77, and 200 requests per second. Therefore, the bottleneck is the application server, which allows the system to handle only 77 connections per second, after which the WEB server is forced to wait and the connection begins to queue.

However, it is important to note that because the system can only allocate 77 connections per second, and a connection requires a processing time of 25ms, not every application user's request can be processed within 25MS. Each component can handle only one connection at a time, so under peak load, the request has to wait for CPU time. In the example above, the average request service time will eventually exceed 1.1 seconds, given the queuing time and the processing time of 25ms. For more information on resolving these queuing issues, see resources.

The following conclusions can be drawn from these examples:

    • The more steps are made between the user making the request and getting the final page, the slower the overall process and the lower the system capacity.
    • This effect becomes more pronounced as the page request rate increases.
    • Architectural decisions made at the beginning of a project also affect the ability of the site to handle the load.

The remainder of this article will discuss these issues in depth.

N-tier architecture for dynamic sites

The architecture of applications, including WEB applications, is often described in terms of layers. Static sites can be thought of as only a single layer of--web servers. If you use a WEB server to run a scripting language (such as PHP) to connect to a database, this can be considered a two-tier scenario. The example in the previous section has three tiers , the front-end WEB server, the application server, and the database.

A software may also consist of multiple layers, depending on the object you are talking to. For example, a PHP script might use a template engine to separate business logic from presentation, which can be considered a separate two-tier. Java applications may perform presentation tasks through a Java servlet, and the servlet executes the business logic with Enterprise Java Bean (EJB) communication, and the EJB obtains more information by connecting to the database. So, in a different perspective, the three-tier architecture may be another, especially when it comes to the various toolsets.

Common architectures

Although the architecture of the application varies, there are some common architectural trends. In general, an application requires four functional layers:

    • Client Layer
    • Presentation Layer
    • Business Logic Layer
    • Data layer

In a Web application, the client layer is handled by a Web browser. The browser displays HTML and executes Javascript (as well as Java applets, ActiveX, or Flash applets) to display information and gather user information to the user. The presentation layer is the interface from the server to the client, which controls the format of the output so that the output can be displayed on the client computer. The business logic layer enforces business rules, such as calculations and workflows, to drive applications. Finally, the data access layer is a persisted data store, such as a database or a file store.

Most applications require all of this four-tier functionality, although they may not need to implement these layers in a clear and complete manner.

Another popular architecture is Model-view-controller, which is a pattern for separating application components. In the MVC pattern, the model encapsulates the business logic layer and encapsulates the data layer with the framework. The view is responsible for processing the data representation sent to the client. The role of the controller is to control the application flow.

Capacity expansion of the layer

Expanding the capacity of a WEB application means allowing it to handle more traffic flows. One aspect of capacity expansion is how to deploy hardware on demand. Another aspect is how the application responds to the new hardware environment. Conceptually, in the case of performance problems, it is often the first thought of using a more powerful server, but the application itself is likely to cause other bottlenecks. Dividing the application into a series of layers helps to shrink the scope of the problem, simplifying capacity expansion.

The application bottleneck is now not considered. There are usually two ways to extend the hardware of an application: horizontal and vertical scaling. Horizontal scaling means adding more servers to the tier. In the previous example, the application server bottleneck would limit the request rate to 77 requests per second, which could be resolved by adding a second application server and sharing the load between two servers. This increases the theoretical capacity to 154 requests per second, and the bottleneck position goes to the database.

Vertical scaling, on the other hand, means using a more powerful computer. You can use a more powerful computer to run two instances of the application server, or to process requests more quickly.

At first glance, you might completely rule out vertical scaling, because it's often cheaper to buy multiple small computers than to keep buying more advanced servers. However, in many cases, vertical scaling is a better approach. If you have a ibm®power® server that supports hardware partitioning through logical partitioning (LPAR), you can add idle capacity to the application server tier.

Application requirements may also prompt you to choose vertical scaling. It is easy to share a user's session state with a shared memory segment on a single server. If you use two servers, you need to share the state in other ways, such as a database. Database access is slower than memory access, so the processing speed of the two servers is less than twice times that of a single server.

A database is another occasion where vertical scaling is often a good use. Having a dataset spanning different servers requires a lot of work at the application layer, such as connecting columns across two databases and ensuring that the data is consistent. It's much easier to use a more powerful database server, and you don't need to rebuild your application to support scattered data.

To model a WEB application as a queue

As you can see from the previous discussion of application architectures, WEB requests pass through multiple stages, each of which takes a certain amount of execution time. The request is queued through each step, after completing one step, and then queued to the next step. Each step is much like the way people queue up in a store to checkout.

You can model a WEB application as a series of steps (called a "queue"). Each component of the application is a queue. A typical WebSphere application modeled as a series of queues is shown in 3.

Figure 3. WebSphere® applications modeled as queued networks

Figure 3 shows that the request waits for the Web server to process them, waits for the Web container, and so on. If the request rate into a queue exceeds the rate at which the queue processes requests, the requests are aggregated. When request aggregation occurs, the service time is unpredictable and the user perceives a browser session delay. The queue in Figure 3 represents the worst case scenario because the WEB server can handle some requests on its own, that is, it does not require access to the database.

Queues are common in UNIX® environments. When an application makes disk requests faster than the rate at which the disk returns data, the operating system queues the disk requests and may adjust the order of requests to reduce seek time. The other queue is the run queue, which contains an ordered list of processes waiting to run. Applications will wait for their turn to use certain limited resources (such as CPUs).

Therefore, queue tuning is an art of balance. The queue is too small to deny the user when there is still excess capacity. A queue is too large to attempt to service too many users, resulting in poor performance.

Another factor that makes the situation more complicated is that these queueing positions are not cost-free. Reserving a queued location can cause memory overhead, which, for the application server, is competing for memory with the thread that is processing the request. Therefore, in general, queuing on an application server is not a good approach. The recommended approach is to queue up before the application server, such as on a WEB server. This means that the Web server maintains a connection to the Web client and makes a request when the application server is idle. The application server only needs to handle requests that it can dispatch in a timely manner.

The Web application layout method and the various queue tuning methods are recommended in IBM's documentation. Note, however, that IBM recommends that you avoid queuing in WebSphere. This means that the request rate that is sent to the WebSphere application server should be controlled within the range that can be processed immediately. The Web server (or the proxy server in front of the Web server) should limit excessive connections and let them wait for processing. This ensures that the load-heavy application server queue can spend time servicing a limited number of requests, rather than trying to service all requests at the time.

Tips for Developers

As a developer, you should improve the scalability of your application by following some general principles. These principles can be applied to most WEB applications.

Measurement facilities

The application should provide metrics to the collection system in some way (even if the collection system is just a log file). These measures include the frequency of accessing a function in the application or the time it takes to process a request, and so on. This does not make the application run faster, but it helps to understand why the application is slowing down and which parts of the code take the longest time. Knowing when to invoke certain functions helps to link the phenomena observed on the system (such as CPU busy or high disk activity) to activities in the application (such as uploading images).

Being able to understand what is happening on the site is the key to expanding the site's capacity. The parts of the code that you think are not optimized may not cause problems. Real bottlenecks can be found only through appropriate measurements.

Session

The Web is inherently stateless. Each request made by the user is independent of the previous request. However, applications are often stateful. Users must log in to the application to prove their identity and may maintain the status of the shopping cart during their visit to the site, and may also fill in personal information for later use. Tracing a session is a costly operation, especially when multiple servers are involved.

Web applications running on a single server can put session information in memory, and any Web application instance running on the server can access shared memory. The user is often assigned a flag that identifies the session in memory. Consider what happens when a second application server is involved. If the user's first request is sent to a server and the second request is sent to another server, there will be two separate sessions, which are not the same.

A common solution to this problem is to store the session in a database rather than in memory. The problem with this approach is that for each request, you need to increase the database read operation and possibly the database write operation. This database is required for each WEB application server.

One solution is to use the session only where the session is needed. Instead of loading sessions for each request, the application loads the session only when a session is required. This reduces the number of requests to the back-end database.

Another method is to encrypt the session data and send it back to the client so that there is no need to store the session locally. The amount of data that can be stored in the user's cookie is limited, but RFC 2109 specifies that the client should be able to store at least 20 cookies per domain name, and each cookie can hold at least 4K bytes of data.

If you find that sessions with database storage are performance bottlenecks and cannot eliminate them, you should consider dispersing them into separate databases or even multiple databases. For example, you can store even-numbered session IDs in one database and odd-numbered session IDs in another database.

Cache

Some parts of the application modify the data more frequently than other parts. News sites may only change the top-level category list once per month. Therefore, it is wasteful for each request to get the latest categorized list by querying the database. Similarly, pages that contain newsletters may be modified only one or two times throughout their lifecycle, so there is no need to regenerate it for each request.

Caching means storing the results of requests with high processing costs for later use. You can cache a categorized list or an entire page.

When considering caching, ask yourself a question: "Does this information have to be up to date?" "If this is not the case, you might consider using caching. It may be important to be able to change the press releases in a timely fashion when the news first appears, but it is sufficient to check the changes every minute and provide the pages through the cache.

A complementary approach is to invalidate cached data items when the underlying data changes. If you modify a newsletter, you can delete the cached version when you save it. For the next request, a new data item is generated because there is no cached version.

When using the cache, you must be aware of what happens when the cache entry expires or is deleted. If there are many requests that cache entries are requested, the cache entries are regenerated for many users when the cache entry expires. To solve this problem, you can regenerate the cache only for the first request, while the other user uses the outdated version until the new cache entry is available.

memcached is a popular distributed memory caching system that is used by many applications deployed in a UNIX environment. The server runs an instance of the Memcache daemon, which allocates a piece of RAM that can be accessed through a simple network protocol. Applications that want to store or get data in memcache first hash the keys, which tells them which server in the Memcache pool should be used. The data is then inspected or stored by connecting to the server, which is much faster than disk or database access.

When looking for data that should be cached, you should also consider whether you really need to provide that information directly. Need to display the user's shopping cart on each page? How about showing the total amount only? Or just show a simple link to "view the contents of your cart".

Edge-side includes (ESI) is a markup language that can be used to divide Web pages into separate, cacheable entities. The application is responsible for generating HTML documents that contain ESI tags, and is responsible for building the components. The proxy cache in front of the WEB application re-assembles the final document according to the various parts and is responsible for caching some components and making requests for other components. Listing 1 shows an example of a ESI document.

Listing 1. ESI Example

Although this example is very simple, listing 1 shows how to stitch together two documents that have their own cache rules.

Asynchronous processing

There is also a question with "Does this information have to be up to date?" "Related:" Must I update this information when I have finished processing the request? "In many cases, you can get user-submitted data and defer processing for a few seconds without having to wait for the user to load the page while processing the information." This is called asynchronous processing. A common approach is to have the application send data to a message queue, such as IBM WebSphere MQ, waiting for the data to be processed when the resource is available. This allows a page to be returned to the user immediately, even though the results of the data processing are unknown.

Consider an e-commerce application where the user submits an order in this program. It may be important to return the credit card test results immediately, but it is not necessary to have the order system immediately confirm that all the contents of the order are valid. Orders can be placed in a queue for processing, which can occur within seconds. If an error occurs, the user can be notified by e-mail if the user is still on the site and can even insert an error notification into his session. Another example is reporting. Instead of waiting for the user to wait for the report to be generated, you can return to the "Reports page in a few minutes" message and generate the report asynchronously on another server.

Conclusion

Applications are often written in a hierarchical manner. The representation logic is separated from the business logic and the business logic is separated from the persistent storage. This approach can improve the maintainability of your code, but it also leads to some overhead. As you scale your application's capacity, you should understand the flow of data in a tiered environment and find where bottlenecks occur.

Technologies such as caching and asynchronous processing can reduce application workloads by reusing previous results or transferring work to another computer. Provide metrics in your application to help keep you informed about hotspots.

The application server environment works much like a queueing network, so be sure to carefully manage the size of the queue to ensure that one layer does not exert excessive pressure on the other. IBM recommends that you queue as much as possible before the application server, such as on an external WEB server or proxy server.

Simply by putting in more hardware, you can rarely scale your application's capacity effectively. It is often necessary to apply these technologies in a comprehensive way to make new hardware work.

This article shares from: http://www.ibm.com/developerworks/cn/aix/library/au-perf_tuning/

Copyright notice: I feel like I'm doing a good job. I hope you can move your mouse and keyboard for me to order a praise or give me a comment, under the Grateful!_____________________________________________________ __ Welcome reprint, in the hope that you reprint at the same time, add the original address, thank you with

Tuning your WEB application for performance

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.