10 tips for writing High-Performance Web Applications
This article discusses:
· General video tutorials> asp. NET performance secrets
· Useful skills and tips for improving asp.net Performance
· Database usage suggestions in ASP. NET
· Cache and background processing in ASP. NET
Writing a Web application using ASP. NET is incredibly simple. It is so simple that many developers do not spend time building their applications to achieve good performance. In this article, I will recommend 10 tips for writing High-Performance Web applications. I will not talk about my discussion, but it is limited to ASP. NET applications, because ASP. NET applications are only a subset of Web applications. This article will not be an authoritative guide for optimizing the performance of Web applications-a complete book can easily achieve this. On the contrary, we should regard this article as a good starting point.
Before I become a workaholic, I often go to rock climbing. Before doing any rock climbing activity, I would like to take a look at the routes in the travel guide and read the recommendations made by people who have been on the peak. However, no matter how good the travel guide is, you need practical experience in rock climbing before trying a challenging target. Similarly, you can only learn how to write high-performance Web applications when you are faced with performance problems or a high-throughput website.
My personal experience comes from Microsoft's ASP. the NET team has worked as a basic program manager, maintaining and managing www.asp.net, and helping to build a Community Server. It is a famous ASP. NET application (ASP. NET Forums ,. text, and connect to the next version of nGallery on a platform. I believe that some of these skills that have helped me in the past will also be useful to you.
You should consider separating your application into several logical layers. You may have heard of the Three-layer (or n-layer) architecture. These are usually prescribed structural patterns that divide the business and/or hardware physically into functions. If the system requires a larger scale, more hardware can be easily added. However, it will produce a performance reduction associated with the business and machine jump, so we should avoid it. As long as possible, try to run ASP. NET page and page related components in the same application.
Because of code separation and the boundary between layers, using Web services or remote processing can reduce performance by 20% or more.
The data layer is a bit different, because it is usually better to have database-specific hardware. However, the cost of jumping a process to a database is still very high. Therefore, you should first consider the performance of the data layer when optimizing the code.
Before you invest in fixing your application's performance problems, make sure that you analyze your application to find the root cause of the problem. Key performance counters (for example, the counters that indicate the percentage of time spent in the garbage collection process) are also useful in identifying where the application spends the most time. Although the time spent is often less intuitive.
In this article, I have discussed two ways to improve performance: bulk optimization, such as using ASP. NET cache and small block optimization, which are often repeated. Optimization of these small pieces is sometimes the most interesting. A small modification to your code can be called thousands of times. For bulk optimization, you may find that the overall performance has a great leap. Optimization of small parts may reduce the time of several microseconds for a given request. However, if you accumulate all the requests every day, the performance will be unexpectedly improved.
Performance in the data layer
When you want to optimize the performance of an application, you have a decisive test. You can prioritize the use of: Do the code need to access the database? If so, how long does the access take? Note that this test can also be applied to code that uses Web services or remote control, but I will not cover that content in this article.
If you require a database request in a code path in your code, and you find that you want to optimize it in other places, such as string operations, stop and perform a key test first. Unless you have a poor performance problem to handle, your time will be better utilized. If you spend time optimizing the database connection and the amount of data returned, you can also perform operations to and from the database.
Now I have introduced the related information in general. Next let's take a look at 10 tips to help your applications perform better. I'll start with what is most effective in improving performance.
Tip 1 -- Return multiple result sets
Check your database code to see if you have more than one request path to access the database (request paths ). Every such round-trip will reduce the number of requests that your application can provide per second. By returning multiple result sets in one database request, you can reduce the total time consumed by database communication. After you reduce the database server management request, you will also make your system more upgradeable.
Generally, you can use dynamic SQL statements to return multiple result sets. I prefer stored procedures. Whether to put the business logic in the stored procedure is controversial, but I think that if the logic in a stored procedure can limit the returned data (reduce the dataset size, the time spent on network connections and does not need to filter data on the logic layer.) It is a good thing.
Use a SqlCommand instance and Its ExecuteReader method to generate a strong business class. You can call NextResult to move the result set Pointer Forward. Figure 1 shows a sample session that uses a defined class to generate several arraylists. Returning the data you need only from the database will significantly reduce the memory application on your server.
1 // read the first resultset
2 reader = command. ExecuteReader ();
3
4 // read the data from that resultset
5 while (reader. Read ()){
6 suppliers. Add (PopulateSupplierFromIDataReader (reader ));
7}
8
9 // read the next resultset
10reader. NextResult ();
11
12 // read the data from that second resultset
13 while (reader. Read ()){
14 products. Add (PopulateProductFromIDataReader (reader ));
15}
16
17
Tip 2-Paging Data Access
The ASP. NET DataGrid provides a great capability: it supports data paging. When pagination is set in the DataGrid, a specific number of results will be displayed at a time. In addition, the paging UI used to navigate between results is displayed at the bottom of the DataGrid. The paging UI allows you to navigate forward or backward between the displayed data. A specific number of results are displayed on each page.
But there is a small problem. When using the DataGrid paging, all data must be bound to the table. For example, your data layer needs to return all data, and the DataGrid needs to fill in all records to be displayed based on the current page. If you return 100,000 records when using the DataGrid paging, 99,975 records will be discarded for each request (assuming the size of each page is 25 records ). When the number of records increases, the performance of the application will be greatly affected, because each request must return more and more data.
A better way to write paging code is to use stored procedures. Figure 2 shows an example stored procedure, which is displayed on the page of the Orders data table in the Nothwind database. In general, all you need to do here is to pass in the page index and page capacity. The database calculates appropriate result sets and returns them.
1 create procedure northwind_OrdersPaged
2 (
3 @ PageIndex int,
4 @ PageSize int
5)
6AS
7 BEGIN
8 DECLARE @ PageLowerBound int
9 DECLARE @ PageUpperBound int
10 DECLARE @ RowsToReturn int
11
12 -- First set the rowcount
13 SET @ RowsToReturn = @ PageSize * (@ PageIndex + 1)
14 set rowcount @ RowsToReturn
15
16 -- Set the page bounds
17 SET @ PageLowerBound = @ PageSize * @ PageIndex
18 SET @ PageUpperBound = @ PageLowerBound + @ PageSize + 1
19
20 -- Create a temp table to store the select results
21 create table # PageIndex
22 (
23 IndexId int IDENTITY (1, 1) not null,
24 OrderID int
25)
26
27 -- Insert into the temp table
28 insert into # PageIndex (OrderID)
29 SELECT
30 OrderID
31 FROM
32 Orders
33 ORDER
34. OrderID DESC
35
36 -- Return total count
37 select count (OrderID) FROM Orders
38
39 -- Return paged results
40 SELECT
41 O .*
42 FROM
43 Orders O,
44 # PageIndex
45 WHERE
46 O. OrderID = PageIndex. OrderID AND
47 PageIndex. IndexID> @ PageLowerBound AND
48 PageIndex. IndexID <@ PageUpperBound
49 ORDER
50 PageIndex. IndexID
51
52END
53
54
In the community service period, we wrote a paging Server Control for these data pages. You will find that the thought I discussed in tip 1 returns two result sets from a stored procedure: the total number of records and the requested data.
The total number of returned records varies according to the executed requests. For example, a WHERE clause can be used to constrain the returned data. We must know the total number of records to be returned to calculate the total number of pages to be displayed in the paging UI. For example, if there are 1,000,000 records in total, and a WHERE clause is used to filter these records into 1,000 records, the paging logic needs to know the total number of records to properly submit the paging UI.
Tip 3: Connection Pool
Establishing a TCP connection between your Web application and SQL Server is expensive. Microsoft developers have been using the connection pool for some time, which allows them to reuse the connection with the database. Instead of creating a new TCP connection for each request, it is better to create a new connection only when there is no available connection in the connection pool. When the connection is closed, it returns to the connection pool-it maintains a connection with the database, rather than completely destroying the TCP connection.
Of course, you must be careful with the leaked connection. Always close your connection when you use them. I repeat it again: No matter who says about Microsoft. NET Framework, when you use it, you must always explicitly call the Close or Dispose method for your connection. Do not believe that the universal language runtime (CLR) will clear and close your connection at a scheduled time. CLR will eventually destroy the class and force the connection to close, but you cannot guarantee when the garbage collection mechanism on the object will actually be executed.
To use the connection pool to achieve the best effect, you need to follow several rules. First, open a connection, complete the work, and then close the connection. If you have to (preferably application tip 1) enable or disable several connections for each request, it is much better to keep the connection open and pass it to several different methods. Second, use the same connection string (if you are using integrated identity authentication, you must also have the same thread ID ). If you do not use the same connection string, for example, different custom connection strings of login users, you cannot obtain the same optimal value provided by the connection pool. If you use integrated authentication when imitating a large number of users, the efficiency of your connection pool will be much lower. The. net clr data performance counters are useful when you try to track any performance issues related to the connection pool.
Whenever your application connects to a resource, such as a database, or runs in another process, you should focus on the time spent connecting to the resource, the time it takes to send and receive data, and the number of round-trips to the database for optimization. Optimizing any type of process hop in your application is the first step to achieve better performance.
The application layer contains the logic that connects to your data layer and converts data into meaningful class instances and logical processes. For example, on a Community Server, you can generate a forum or thread set and apply business rules such as permit. More importantly, this is where the buffer logic is executed.
Tip: 4--ASP.NET buffer API
The first thing to consider before you start writing the first line of application code is to maximize the Architecture Application Layer and utilize the cache features of ASP. NET.
If your component runs in an ASP. NET application, you only need to reference System. Web. dll in your application project. When you need to access the Cache, use the HttpRuntime. Cache attribute (this object can also be accessed through Page. Cache and HttpContext. Cache ).
There are several principles for using cached data. First, if the data can be used multiple times, caching it is a good choice. Second, if the data is generic rather than for specific requests or users, caching is a good choice. If the data is user-specific or request-specific but has a long lifetime, it can also be cached, but may not be frequently used. Third, a principle that is often overlooked is that sometimes you can cache too much. Generally, on an x86 computer, to reduce the possibility of out-of-memory errors, you want to run a process that uses a private byte of no more than MB. Therefore, the cache should be restricted. In other words, you may need to re-use the computation results, but if the computation requires ten parameters, you may need to Cache 10 arrays, which may cause you trouble. Memory insufficiency errors caused by excessive caching are the most common in ASP. NET, especially for large datasets.
The cache has several excellent functions that you need to understand. First, the cache will implement the least recently used algorithms, so that ASP. NET can force cache cleanup when the memory runs less efficiently-automatically delete unused items from the cache. Second, the cache supports mandatory expired dependencies. These dependencies include time, key, and file. Time is often used, but for ASP. NET 2.0, a new invalidation type with more powerful features is introduced: database cache failure. It means that items in the cache are automatically deleted when the data in the database changes. For more information about database cache invalidation, see the Dino Esposito Cutting Edge column of MSDN Magazine in July 2004. For more information about the Cache architecture, see Figure 3.
Tip 5-Cache per request
In the previous sections of this article, I mentioned that small improvements to code paths that are frequently traversed can provide greater overall performance gains. One of these minor improvements is definitely my favorite, and I call it "cache per request ".
The cache API is designed to cache data for a long period of time or to meet certain conditions, however, each request cache means that only the data is cached as the request duration. For each request, you must frequently access a specific code path, but the data only needs to be extracted, applied, modified, or updated once. This may sound a little theoretical, so let's take a specific example.
In Forum applications of Community servers, each server control used on the page requires personalized data to determine the appearance, style sheet, and other personalized data. Some of the data can be cached for a long time, but some of the data is extracted only once for each request, and then reused multiple times during the execution of the request, such as for the control's appearance.
To Cache each request, use ASP. NET HttpContext. An HttpContext instance is created for each request. During this request, the instance can be accessed from any location in the HttpContext. Current attribute. The HttpContext class has a special Items set attribute. The objects and data added to this Items set are cached only during the duration of the request. Just as you can use the cache to store frequently accessed data, you can also use HttpContext. Items to store data that is only used based on each request. The logic behind it is very simple: when the data does not exist, it is added to the HttpContext. Items set. In later searches, only the data in HttpContext. Items is returned.
Tip 6-background processing
The path to the code should be as fast as possible, right? Sometimes you may find that you need a lot of resources to execute a task for each request or for every n requests. This is an example of sending an email or analyzing and verifying incoming data.
When analyzing ASP. NET Forums 1.0 and re-constructing the content that makes up the Community Server, we find that the code path for publishing new posts is very slow. Each time a new post is published, the application must first ensure that there are no duplicate posts, and then use the "bad words" filter to analyze the post and analyze the post's character illustration, add tags and indexes to the post. Add the post to the appropriate queue when requesting the post, verify the attachment, and immediately send an email notification to all subscribers after the post is published. It is clear that this involves many operations.
Studies have found that most of the time is spent on Indexing logic and sending emails. Indexing posts is a very time-consuming operation. people found that the built-in System. Web. Mail function needs to connect to the SMTP server and then send emails continuously. When the number of subscribers in a specific post or topic area increases, it takes longer and longer to execute the AddPost function.
No email indexing is required for each request. Ideally, we want to batch this operation by indexing 25 posts at a time or sending all emails every five minutes. We decided to use the code I used to prototype the data cache invalidation, which was eventually included in Visual Studio 2005.
The Timer class in the System. Threading namespace is very useful, but it is not very famous in. NET Framework, at least for Web developers. After being created, the Timer class calls the specified callback for a thread in the ThreadPool at a configurable interval. This means that you can set the Code so that it can be executed without passing in requests to ASP. NET applications. This is an ideal scenario for background processing. You can also perform operations such as indexing or sending emails in this background process.
However, this technology has several problems. If the application domain is detached, the timer instance stops the trigger event. In addition, the CLR has a hard standard for the number of threads for each process, so this situation may occur on servers with heavy loads: the timer may not ensure that the thread continues to complete the operation, in addition, it may cause latency to some extent. ASP. NET tries to minimize the chance of the above situation by retaining a certain number of available threads in the process and only using a portion of the total thread for request processing. However, if you have many asynchronous operations, this may be a problem.
There is not enough space to place the code here, but you can download an easy-to-understand example where the URL is www.rob-howard.net. Please take a look at the slides and demos in the Blackbelt TechEd 2004 demonstration.
Tip 7-page output cache and Proxy Server
ASP. NET is your presentation layer (or your presentation layer). It consists of pages, user controls, server controls (HttpHandlers and HttpModules), and the content they generate. If you have an ASP.. NET page, which generates output (HTML, XML, image, or any other data). When you run this code for each request, it generates the same output, you have an excellent alternative for page output caching.
Add the following line to the top of the page:
<% @ Page OutputCache VaryByParams = "none" Duration = "60" %>
You can efficiently generate an output for this page and reuse it multiple times. The maximum time is 60 seconds. Then, the page will be re-executed, the output will be added to ASP again.. NET cache. You can do this by using some low-level programmable APIs. There are several configurable settings for the output cache, such as the VaryByParams attribute just mentioned. VaryByParams is requested, but you can also specify http get or http post parameters to change the cache items. For example, you only need to set VaryByParam = "Report" to set the default. aspx? Report = 1 or default. aspx? Report = 2. You can specify other parameters by specifying a list separated by semicolons.
Many people do not realize that ASP. the. NET page also generates some HTTP header that flows down to the cache Server, such as the header used by Microsoft Internet Security, Acceleration Server, or Akamai. After the HTTP cache table header is set, documents can be cached on these network resources, and client requests can be satisfied without returning the original server.
Therefore, using the page output cache will not make your application more efficient, but it may reduce the load on the server, because the downstream stream cache technology will cache documents. Of course, this can only be anonymous content. Once it becomes a downstream stream, you will no longer see these requests and will no longer be able to perform authentication to block access to it.
Tip 8-run IIS 6.0 (even if you only want to use the kernel cache)
If you have not run IIS 6.0 (windows Server 2003), you will miss some good performance enhancements in Microsoft Web Server. In Tip 7, I discussed the output cache. In IIS 5.0, the request is sent through IIS and then enters ASP. NET. When cache is involved, HttpModule in ASP. NET receives the request and returns the content in the cache.
If you are using IIS 6.0, you will find a good small feature called Kernel cache, which does not need to make any code changes to ASP. NET. When the request is output by ASP. NET, the IIS kernel cache receives a copy of the cached data. When a request comes from a network driver, the kernel-level driver (switch to user mode without context) will receive the request. If the request is cached, the cached data is refreshed to the response and then executed. This means that when you use the kernel mode cache together with IIS and ASP. NET output cache, you will see untrustworthy performance results. During the development of Visual Studio 2005 in ASP. NET, I was the development manager responsible for ASP. NET performance. The developer completes the specific work, but I want to see all the reports on a daily basis. Kernel Mode cache results are always the most interesting. The most common feature is that the network is full of requests/responses, while the CPU usage during IIS running is only about 5%. This is amazing! Of course, there are other reasons for using IIS 6.0, but the kernel mode cache is the most obvious one.
Tip 9-use Gzip for compression
Although gzip is not necessarily a server performance technique (because you may see an improvement in CPU usage), gzip compression can reduce the number of bytes sent by the server. This makes people think that the page speed is faster and the bandwidth usage is reduced. Depending on the sent data, the degree of compression, and whether the client browser supports (IIS only sends gzip-compressed content to clients that support gzip compression, such as Internet Explorer 6.0 and Firefox ), your server can serve more requests per second. In fact, requests per second are increased almost every time you reduce the number of returned data.
Gzip compression has been built into IIS 6.0, and its performance is much better than the gzip compression used in IIS 5.0. This is good news. Unfortunately, when you try to enable gzip compression in IIS 6.0, you may not be able to find this setting in the Properties dialog box of IIS. The IIS team added excellent gzip functionality to the server, but forgot to include a management UI for enabling the functionality. To enable gzip compression, you must go deep into the XML configuration settings of IIS 6.0 (this will not cause heart weakness ). By the way, this is due to Scott Forsyth of OrcsWeb, who helped me raise this question about the www.asp.net server hosted on OrcsWeb.
This article will not describe the steps. Please read the article by Brad Wilson at IIS6 Compression. There is also an article about how to Enable Compression for ASPX on Enable ASPX Compression in IIS. However, due to implementation details, dynamic compression and kernel cache cannot exist in IIS 6.0.
Tip 10-Server Control view status
View status is an interesting name used to indicate ASP. NET that stores some status data in the hidden output field of the generated page. When the page is sent back to the server, the server can analyze, verify, and apply the view status data back to the Control tree of the page. View status is a very powerful function because it allows the status to be maintained with the client and can be saved without cookie or server memory. Many ASP. NET Server controls use the view status to maintain the settings created during interaction with page elements, such as the current page displayed when data is paged.
However, the use of view status also has some disadvantages. First, when a page is served or requested, it will increase the total load of the page. Additional overhead also occurs when the view status data sent back to the server is serialized or deserialized. Finally, the view status increases the memory allocation on the server.
Several server controls tend to over-use the view status even if they are not needed. The most famous one is the DataGrid. ViewState is enabled by default, but you can disable it at the control or page level if you do not need it. In the control, you only need to set the EnableViewState attribute to false, or use the following settings on the page to set it globally:
<% @ Page EnableViewState = "false" %>
If you do not send back the page, or you always regenerate the controls on the page for each request, you should disable the view status at the page level.
Summary
I 've told you some tips that I think are helpful when writing high-performance ASP. NET applications. As I mentioned earlier in this article, this is a preliminary guide, not the final conclusion of ASP. NET performance. (For information on Improving the Performance of ASP. NET applications, see Improving ASP. NET Performance .) The best solution to specific performance problems can be found only through your own hands-on experience. However, these tips should provide you with some good guidance during your journey. There is almost no absolute thing in software development; every application is unique.