Over the past decade, many new software technologies and platforms have emerged. Each new technology requires specialized knowledge to create applications with good performance. Program . As various Internet technologies (such as blogs) make it easy for disappointed users to deny your applications, you do need to put performance first. Early in the plan, you should add response performance requirements and create prototypes to determine possible technical limits. Throughout the development process, various performance aspects of the application should also be measured to detect possible performance declines, and tester files should be ensured and errors should be tracked in the case of slow speeds. Even with the best plan, performance problems must still be investigated during product development. In this article, we will show you how to use Visual Studio Team System Development edition or Visual Studio team suite to identify performance bottlenecks in applications. The Visual Studio analyzer will be introduced to you through a sample Performance Survey. Please note that although we use C # in this article to write Code But most of the examples here are for the local C/C ++ and Visual Basic The Code is also valid.
Application Analysis we will use the analyzer attached to the two previously mentioned Visual Studio versions. First, write a small sample project (such Figure 1 ). This application is not very effective and takes about 10 seconds to draw irregular images. Figure 1 Target Program for Performance Testing (Click the image to get a larger view) to start the investigation, start the "performance wizard" (performance wizard) from the new "analyze" menu in Visual Studio 2008 ). In Visual Studio 2005, this function is available from the tools | Performance Tools menu. Start a wizard with three steps. The first step is to specify the target project or website. Step 2 provides two different Analysis Methods: Sampling and detection. (For more information about these analysis methods, see the performance analysis explanation column .) Now we will select the default value. After the Wizard is complete, a "performance Explorer" dialog box is displayed and a new Performance session is created. This session contains the target application (Mandel in our example) and is not reported. To start the analysis, click the "launch with profiling" button in the toolbar of the tool window. After the application draws an irregular image, immediately close the form to stop the analysis. Visual Studio automatically adds a new report to a performance session and starts analysis. After the analysis is complete, the Visual Studio Analyzer displays "Performance Report Summary" (performance report summary) to list the functions with the largest overhead (see Figure 2 ). The report displays these functions in two ways. The first method is to measure the work performed directly or indirectly by the listed functions. For each function, numbers represent the accumulated samples collected in the function body and all its sub-calls. The second list does not count the samples collected in the subcall. The summary page shows that the Visual Studio analyzer collects 30.71% of samples during the execution of the drawmandel method. The remaining 69% of samples are scattered among other functions, which will not be described here. For more information about report options, see "report visualization options ". Figure 2 Performance testing shows function calls with high overhead (Click the image to obtain a larger view.) view the "call tree" view of the report (for example Figure 3 As shown in), the "implicit samples %" (including sample %) column represents the samples collected in the Function and Its subitem. The exclusive samples % (exclusive sample %) column represents samples collected only in the function body. You can see that the drawmandel method calls bitmap. setpixel directly. Although drawmandel occupies 30.71% Of the total samples, Visual Studio analyzer collects 64.54% of the samples from Bitmap. setpixel and its subitems. Here, bitmap. setpixel accounts for only 0.68% of the subjects (so it is not displayed on the summary page ). However, bitmap. setpixel generates most of the processing through its subitem. It is the real bottleneck of applications. Figure 3 Call tree example of the tested application (Click the image to get a large view) Obviously, bitmap. setpixel is not the best for the Mandel project. Our applications need a faster way to access all pixels on the form. Fortunately, bitmap provides another useful API: bitmap. lockbits. This function allows the program to write bitmap memory directly, thus reducing the overhead of setting a single pixel. In addition, to optimize the drawing, we will create a pure integer array and fill it with the color values of each pixel. Then, the values of the array are copied to the in-place graph through a single operation.
Analysis and explanation of Application Performance Optimization Sampling Method During analysis, the analyzer attaches a program-similar method to a running process. The analyzer periodically interrupts the process and checks which function is at the top of the stack and the code path of the function. In other words, the Visual Studio analyzer collects samples of the current process status. Sampling is a non-invasive statistical analysis method. The more samples the function collects, the more processing the function may execute. The Visual Studio analyzer also collects information about the call path that causes this execution. Therefore, this tool displays the entire call stack after analyzing the collected data. By default, the Visual Studio analyzer collects a sample every 10 million CPU cycles. In addition to the CPU cycle, sampling may be performed when other events (such as page errors, system calls, or missing CPU cache) occur. Analyze session attributes to control the sampling object and frequency of the analyzer. As a low-cost solution, sampling is often a recommended option. However, it is worth noting that sampling only collects information when the program uses the CPU effectively. Therefore, Visual Studio analyzer does not collect samples when a process is waiting for a disk, network, or any other resource. This is why we recommend that you use detection and analysis if the application does not use the CPU effectively. In detection mode, the Visual Studio analyzer injects special commands (called probes) at the beginning and end of each function to modify (detect) binary files. The probe allows the analyzer to measure the time taken to run each function. In addition, the analyzer adds a pair of probes around each external function call to determine the overhead of these external calls. By using detection and analysis, You can accurately measure various data, such as the time taken by running a function ("elapsed time ") the number of function calls and the time when the function is being switched out by the OS using the CPU ("application time. The disadvantage of detection is that a large amount of data is collected, which requires a longer analysis time. In addition, this analysis mode has higher runtime overhead. Higher overhead may inadvertently change the performance characteristics of the analyzed application. By using both sampling and detection, you can also collect . NET Framework application memory allocation data. You can use the performance session properties page to enable and adjust the collection of. Net memory allocation data. It is usually called memory analysis and has a lot of msdn information about this topic. Documentation. Note that it is the only function in the analyzer that is only used for. NET Framework-compatible code. For other functions, the Visual Studio analyzer is identical between the local C/C ++ and. Net-based applications.
Next, modify the drawmandel method to use lockbits instead of setpixel, and check the performance of the change. After creating a bitmap, add the following code lines to lock the bitmap bit and obtain the pointer to the bitmap memory:
Copy code
Bitmapdata bmp data = bitmap. lockbits (New rectangle (0, 0, width, height), imagelockmode. readwrite, bitmap. pixelformat); intptr = BMP data. scan0; int pixels = bitmap. width * bitmap. height; int32 [] rgbvalues = new int32 [pixels];
Then, in the internal cycle of setting pixels, comment out the call to bitmap. setpixel and replace it with a new statement, as shown below:
Copy code
// Bitmap. setpixel (column, row, colors [color]); rgbvalues [row * width + column] = colors [color]. toargb ();
In addition, add the following code lines to copy the array to the bitmap memory:
Copy code
Marshal. Copy (rgbvalues, 0, PTR, pixels); bitmap. unlockbits (BMP data );
Now, if you re-run the application in the analyzer, you can see that the irregular graph is almost three times faster (see Figure 4 ). Note that the summary page of the new performance report shows that the principal of drawmandel directly occupies 83.66% Of the total samples. As we optimized the drawing, the bottleneck is now the calculation of irregular graphics. Figure 4 Performance Analysis of revised Code (Click the image to get a larger view.) now, we will further optimize the computation. Unfortunately, the bottleneck in a single function needs to be found this time. Drawmandel is a complicated method, so it is difficult to know which computations should be concerned. Fortunately, the Visual Studio 2008 Sampling Analyzer also collects row-level data by default to help determine which rows in the function have the largest overhead. To View row-level data, you need to view performance reports from other perspectives. Switch from the current view menu to the modules view. Unlike the call Tree View, the "modules" view does not display the mutual call methods of functions and the overhead of these calls in the context of the parent function. Conversely, the "modules" (module) view contains each executable file (assembly or DLL) and the total number of samples accumulated for each function in the executable file. The Visual Studio analyzer accumulates the data from all call stacks. The "modules" (module) view is more suitable for observing Larger images. For example, if you sort by "exclusive samples %" (exclusive sample %) column, you can see that mandel.exe itself executes 87.57% of the processing. As a result of optimization, GDI + occupies less than 3% of the processing. Expand these modules to view the same information of a single method. In Visual Studio 2008, apart from the function level, you can expand the tree to view the same data of a single row or even a single instruction in these rows (see Figure 5 ). Figure 5 Jump to the analyzed code line (Click the image to get a larger view) to jump Source code Can be viewed Figure 6 Code. The Code calculates the square root in the innermost loop. This operation is costly and occupies 18% of the total application processing. Figure 6 The highlighted lines show the code that can be optimized. The first row uses an unnecessary square root, while the second row remains unchanged for the while loop.
Figure 6 code-level optimizations
Original code
Copy code
For (INT column = 1; column <width; column ++) {Y = ystart; For (int row = 1; row
Optimized Code
Copy code
For (INT column = 1; column <this. width; ++ column) {Y = ystart; int Index = column; For (int row = 1; row
After the modification, re-analyze the application and check the optimized code performance. After the application is generated and run, you can re-draw the irregular graph in 1-2 seconds. Therefore, the startup time of the application is significantly reduced. Visual Studio 2008 contains a new feature that compares two performance reports. To learn more about this feature, we re-run the application in the analyzer and capture the latest performance reports. To view the differences between the two application versions, select the original report and the latest report in performance explorer. Right-click the report and click the "Compare performance reports" option in the context menu. This command generates a new report showing the differences between all functions and the exclusive samples % (exclusive sample %) values of the function in both reports. Since we have reduced the overall execution time, the relative percentage of drawmandel has increased from 31.76 to 70.46. To better view the actual optimization effect, change the columns in the Compare options pane to "Comprehensive samples" (including samples) (see
Figure 7 ). At the same time, the threshold value is increased to 1500 samples to ignore small fluctuations. In addition, you may have noticed that by default, a report shows a negative number or first shows the function with the least optimization (because it is often used to reduce search performance ). However, for optimization purposes, we sort the Delta column in reverse order so that we can see the optimized function at the top. Note that the number of samples for drawmandel and its subfunctions has changed from 2,064 to 175. Over 10 times of optimization! To demonstrate the performance improvement, you can copy and paste any part of the report. Figure 7
Compare the optimization results of drawmandel (Click the image to get a larger view)
visual options of the Target Analysis Report Visual Studio use the following performance report options to view performance data in multiple ways: "Call Tree", "modules", "functions", and other options. The "summary" view is displayed by default when a report is opened. For example, to find the call path that generates most processing in Visual Studio 2008, select the "call tree" view from the "current view" menu. (IN Visual Studio 2005, select the "call tree" tab at the bottom of the report .) The call tree view contains the aggregation tree of all call stacks. The inclusive samples % (including sample %) column shows the total overhead of each branch in these code paths. You can find the performance bottleneck by following the branch with the largest overhead. In Visual Studio 2008, the analyzer team added two new features to simplify the use of performance reports. The first feature added is the noise reduction option. By default, the report will cut down unimportant small functions so that users can easily view functions with greater impact. This option is usually called cropping. In addition, the team puts together functions that only call other functions for processing without any processing, thus reducing the call tree depth. The Visual Studio analyzer calls it collapse. The noise reduction option in the performance report controls the cropping and folding thresholds. If you encounter problems when searching for a specific function in the performance report, you can disable the noise reduction option. The second major improvement to the call tree in Visual Studio 2008 is the "Hot Path" button and corresponding context menu. The "Hot Path" highlights the code path with the largest provisioning in the program and goes down to this path until you see the major processing performed by a single function (and not delegated. Then, the function is highlighted in "Hot Path. If there are two or more independent important code paths, the "Hot Path" will stop at the position where the branches appear in the tree. If "Hot Path" provides multiple branches for an application, you can select one of the most interested branches and apply "Hot Path" to the specific branch ".
So far, we have demonstrated how to use Visual Studio analyzer to improve application performance. However, many real applications require multiple user operations to understand performance issues. Generally, you may prefer to ignore all data collected before the analysis starts. In addition, you may want to collect data from multiple situations during a single run. To demonstrate how to use analyzer in such cases, we will switch the topic and analyze a sample e-commerce website (in fact, we are using a modified version of thebeerhouse example, this example can be obtained from asp.net/downloads/starter-kits/the-beer-house ). This website has been loaded for a long time, but because it is a one-time overhead, we are not very interested in the start time, it is more interested in the reason that it takes a long time to load the product catalog and the reason why the project is very slow to add to the shopping cart. Based on the purpose of this article, we will only look into the first case. However, we will collect data for these two scenarios and show you how to filter data to focus on performance issues in specific scenarios. First, create a new analysis session. As described above, use the "analyze" menu to start the "performance wizard" (performance wizard), and then select the default value on the Wizard Page. Note that for websites, "instrumentation profiling" is the default option. Websites are generally not subject to CPU restrictions and usually rely on database server applications to undertake heavy tasks. Therefore, detection is a better option. After creating a Performance Session, start the website in analyzer, but we will avoid the start time and focus only on one situation at a time. Therefore, in performance explorer 2008 of Visual Studio, you can use launch with profiling paused to start and pause analysis) to start the application with the Visual Studio analyzer, but the analyzer does not collect any data before the user recovers the analysis (see Figure 8 ). Figure 8 Pause Analyzer at startup When the website is being loaded, switch back to Visual Studio. Note that the Visual Studio Analyzer displays a new tool window named "Data Collection control. This window allows you to pause and resume collection multiple times. An important part of this control is the predefined tag list. They are bookmarks or tags that can be inserted into analysis data to indicate the time points of interest. We use these tags to separate the beginning and end of each user scenario. First, use the rename mark command in the context menu to rename the four tags. And delete unused tags (see Figure 9 ). So far, we have paused the analysis to avoid start-up time collection and preparation of our situation. Restore the analysis after the website is loaded. Figure 9 Name analysis tag for test cases We are ready to start the first scenario. Select the product catalog request tag and click the insert mark button to mark the beginning of the situation. Switch back to Internet Explorer To display the product catalog. After the product directory is displayed on the website, insert the product catalog rendered mark to indicate the end of the situation. To switch to the next scenario, select the beer cap product. Insert the respective tags before and after the add operation. In this case, all the situations are completed, and then the application is exited. After the data analysis is completed, the Visual Studio Analyzer displays "Performance Report Summary" (performance report summary ). This report is slightly different from the sample report because it shows the most called functions and the duration of the longest function. It is worth noting that this data is aggregated throughout the life of the application and includes the above two situations and all previous activities. Obviously, we want the performance report to show us only the data for the given situation and filter out the rest. In Visual Studio 2008, the analyzer has a new "marks" view that lists all inserted tags. (Note that the Visual Studio analyzer automatically marks the beginning and end of the inserted program ). To create a filter for the first scenario, select the marker indicating the beginning and end of the scenario, and then select "Add filter on marks" in the context menu (Add filter on tags ). To automatically create the required filter (see Figure 10 ). In addition to tags, you can also filter data by thread, process, or interval. After setting the filter, you can continue to execute it. Figure 10 Target Program for Performance Testing (Click the image to get a larger view.) Note that this filter applies to all views in the performance report. This is why the Visual Studio analyzer automatically displays the new summary page of filtered data. This summary page is specific to the product catalog presentation. For example, it can be seen that system. idisposable. Dispose takes 3.4 seconds or 61% of the execution time, and 41% is required before. By filtering, we can accurately see the importance of this function to specific problems. This performance issue is now fixed. You need to find the function to process these objects in the code. The simplest method is to use the "call tree" and "Hot Path" functions (such Figure 11 ). It immediately shows that the setinputcontrolshighlight function causes most calls to idisposable. dispose. Figure 11 Use hot path to find problems (Click the image to get a large view) it turns out that this function contains a very inefficient logging mechanism:
Copy code
Foreach (control CTL in container. controls) {log + = "setting up control:" + CTL. clientid; string tempdir = environment. getfolderpath (environment. specialfolder. mydocuments); Using (streamwriter Sw = new streamwriter (path. combine (tempdir, "website. log "), true) {SW. writeline (log );}...
it is a large number of logs left by errors during debugging and is no longer used for any specific diagnostic purposes, so you can safely delete them. Similarly, the "Hot Path" feature in Visual Studio 2008 allows us to quickly identify bottlenecks in applications. Whether you use the local C/C ++, C #, or visual basic to write applications, Visual Studio analyzer can significantly simplify performance surveys, and help you write faster and more effective applications. Visual Studio 2008 provides more improved functions for Visual Studio analyzer, making it easier to identify performance bottlenecks in applications than ever before. From: http://msdn.microsoft.com/zh-cn/magazine/cc337887.aspx