Program performance analysis-using VS2008 's analysis tools

Source: Internet
Author: User
Tags square root versions
Use the Visual Studio parser to identify application bottlenecks

Hari Pulapaka and Boris Vidolov

This article discusses:

Target Performance bottlenecks

Application Code Analysis

Compare and analyze data

Performance reports

The following techniques are used in this article:

Visual Studio 2008

Over the past decade, many new software technologies and platforms have emerged. Each new technology requires specialized knowledge to create a well-performing application. Now, because a variety of Internet technologies, such as blogs, make it easy for frustrated users to negate your application, you really need to put performance first. Early in the plan, you should add response performance requirements and create prototypes to identify possible technical limitations. Throughout the development process, you should also measure the performance aspects of your application to discover possible performance degradation, while ensuring that testers ' files are in a slower situation and track their errors.

Even if you have the best plan, you still have to investigate performance issues during the product development process. In this article, we'll show you how to use Visual studio®team System development Edition or VisualStudio Team Suite to identify performance bottlenecks in your application. You will be introduced to the Visual Studio parser by walkthrough a sample performance Survey. Note that although we are using C # in this article to write code examples, most of the examples here are equally valid for native C + + and Visual Basic® code.

Application Analytics

We will use the parser that came with the two versions of Visual Studio mentioned earlier. Start by writing a small sample project (as shown in Figure 1) for drawing Mandelbrot irregular graphics. The application is not very effective and takes about 10 seconds to draw an irregular shape.


Figure 1 Target program for performance testing

To start the survey, start the Performance Wizard from the new Analyze (Analysis) menu in Visual Studio 2008. In Visual Studio 2005, this feature is available from the Tools | The Performance Tools menu is available. This launches a three-step wizard with the first step of specifying the target project or site. The second step provides two different methods of analysis: sampling and detection. (For more information about these profiling methods, see the "Profiling Interpretation" sidebar.) Now, we'll pick the default value.

When the wizard finishes, it displays a performance explorer (Performance Explorer) dialog box and creates a new performance session. This session contains the target application (Mandel in our example) and is not reported. To start the analysis, click the Launch with Profiling (Start and analyze) button in the Tool window toolbar.

After the application draws an irregular drawing, close the form immediately to stop parsing. Visual Studio automatically adds a newly created report to the performance session and begins the analysis. When the analysis is complete, the Visual Studio parser displays "performance report Summary" (Summary of performance reports), listing the most expensive functions (see Figure 2). The report displays these functions in two ways. The first method measures the work performed directly or indirectly by the listed functions. For each function, the number represents the accumulated sample collected in the function body and all its child calls. The second list does not count the samples collected in the child call. This summary page shows that the Visual Studio parser collected 30.71% samples during the execution of the Drawmandel method. The remaining 69% of the samples are scattered among the other functions, and are not added here. To learn more about the reporting options, see Sidebar "Report Visualization Options".

Figure 2 Performance test shows a large overhead function call

View the call tree view of the report (as shown in Figure 3), the "Inclusive Samples%" (contains sample%) column represents the sample collected in the function and its subkeys. The "Exclusive Samples%" (exclusive sample%) column represents only the samples collected in the function body. You can see that the Drawmandel method calls Bitmap.setpixel directly. Although Drawmandel itself occupies 30.71% of the total sample, the Visual Studio parser collects 64.54% of samples from Bitmap.setpixel and its children. Where the Bitmap.setpixel body accounts for only 0.68% (so it is not displayed on the summary page). However, Bitmap.setpixel produces most of the processing through its children. It is the real bottleneck of the application.

Figure-Example of a call tree for the 3 application being tested

Obviously, Bitmap.setpixel is not the best for Mandel projects. Our application needs a quicker way to access all the pixels on the form. Fortunately, the bitmap class also provides another useful API:Bitmap.LockBits. This function allows the program to write directly to the bitmap memory, thereby reducing the overhead of setting a single pixel. In addition, to optimize the drawing, we will create an array of pure integers and populate them with the color values of each pixel. The value of the array is then copied into the bitmap by a single operation.

Optimizing Applications

Performance Analysis explained

When profiling using the sampling method, the parser attaches to a running process in a way that is similar to a debugger. The parser then periodically interrupts the process and checks which function is on top of the stack and the code path of the function. In other words, the Visual Studio parser collects samples of the current process state. Sampling is a non-invasive statistical analysis method. The more samples that are collected in a function, the more processing the function may perform.

The Visual Studio Parser also collects information about the call path that caused this execution. Therefore, the tool can display the entire call stack after analyzing the collected data. By default, the Visual Studio parser collects a sample per 10 million CPU cycles. In addition to CPU cycles, sampling can occur when other events such as page faults, system calls, missing CPU caches, and so on, are performed. The properties of the profiling session control the sampling object and frequency of the parser.

As a low-overhead solution, sampling is often the recommended option. However, it is worth noting that sampling collects information only when the program is using the CPU efficiently. Therefore, when a process waits for a disk, network, or any other resource, the Visual Studio parser does not collect samples. This is why the instrumentation analysis is recommended if the application is not using the CPU effectively.

In detection mode, the Visual Studio parser modifies (detects) binary files by injecting special instructions (called probes) at the beginning and end of each function. Probes allow the parser to measure the time it takes to run each function. In addition, the parser adds a pair of probes around each external function call to determine the cost of these external calls.

By using instrumentation analysis, you can accurately measure data such as the time it takes to run the function ("Elapsed time"), the number of calls to the function, and the time that the function is using the CPU ("Application Time") and not being switched out by the OS. The drawback of detection is that a large amount of data is collected and therefore takes longer to analyze. In addition, this profiling mode has a higher runtime overhead. Higher overhead may inadvertently change the performance characteristics of the application being analyzed.

By using both sampling and instrumentation, you can also collect memory allocation data for applications that are based on the Microsoft®.net Framework. Users can use the performance Session properties page to enable and adjust the collection of. NET memory allocation data. It is often referred to as memory analysis and has a large number of MSDN® documentation on the subject. Note that it is the only feature in the parser that is only used for. NET Framework-compatible code. For other features, the Visual Studio parser is exactly the same between native C + + and. NET-based applications.

Next, modify the Drawmandel method to use LockBits instead of setpixel and see what performance this change will produce. After creating the bitmap, add the following line of code to lock the bitmap position and get a pointer to the bitmapped memory:

BitmapData bmpdata = 
    bitmap. LockBits (
        new Rectangle (0, 0, Width, Height), 
        imagelockmode.readwrite, 
        bitmap. PixelFormat);
IntPtr ptr = bmpdata.scan0;
int pixels = bitmap. Width * Bitmap. Height;
int32[] rgbvalues = new Int32[pixels];

Then, in the internal loop that sets the pixel, comment out the call to Bitmap.setpixel and replace it with the new statement, as follows:
Bitmap. SetPixel (column, row, Colors[color]);
Rgbvalues[row * Width + column] = 
    Colors[color]. ToArgb ();

Additionally, add the following line of code to copy the array into the bitmap memory:

Marshal.Copy (rgbvalues, 0, ptr, pixels);
Bitmap. Unlockbits (Bmpdata);

Now, if you re-run the application in the parser, you can see that the irregular graph is drawing almost three times times faster (see Figure 4). Please note that the summary page of the new performance report shows that the body of Drawmandel directly occupies 83.66% of the total sample. As we optimized the drawing, the bottleneck now becomes the calculation of irregular graphs.

Figure 4: Performance Analysis of revision code

Now, we'll further refine the calculation. Unfortunately, this time we need to look for bottlenecks in a single function. Drawmandel is a more complex approach, so it is difficult to know which calculations to focus on. Fortunately, the Visual Studio 2008 Sample Parser also collects row-level data by default, helping to determine which rows in the function are the most expensive.

To view row-level data, you need to view the performance report from a different perspective. From the current View menu, switch to the Modules view. Unlike the call Tree view, the Modules view does not display information such as how functions are called each other in the context of the parent function, as well as the cost of those calls. Instead, the Modules view contains each executable (assembly or DLL) and the cumulative total number of samples for each function in the executable file. The Visual Studio parser accumulates the data from all call stacks.

The "Modules" (module) view is more suitable for viewing larger images. For example, if you sort by the column "Exclusive Samples%" (exclusive sample%), you can see that Mandel.exe itself performs 87.57% of the processing. As a result of optimization, GDI + consumes less than 3% of the processing. Expand these modules to see the same information for a single method. In addition, in Visual Studio 2008, in addition to the function level, you can expand the tree to see the same data for a single row or even a single instruction in those rows (see Figure 5).

Figure 5 jumps to the parsed line of code and jumps to the source code to see the code shown in Fig. 6. The code computes the square root in the most internal loop. This operation is expensive and takes up 18% of the total application processing. The highlighted line in Figure 6 shows the code that can be optimized. The first line uses an unnecessary square root, and the second row is constant for the while loop.

Original code

for (int column = 1; column < Width; column++)
{
 y = Ystart;
 for (int row = 1; row < Height; row++)
 {
  double x1 = 0;
  Double y1 = 0;
  int color = 0;
  int dept = 0;
  while (Dept < && MATH.SQRT ((x1 * x1) + (y1 * y1)) < 2)
  {
   dept++;
   Double temp = (x1 * x1)-(y1 * y1) + x;
   Y1 = 2 * X1 * y1 + y;
   x1 = temp;
   Double percentfactor = dept/(100.0);
   color = ((int) (Percentfactor * 255));
  }
  Comment this line to avoid calling Bitmap.setpixel:
  //bitmap. SetPixel (column, row, Colors[color]);
  Uncomment the block below to avoid bitmap.setpixel:
  Rgbvalues[row * Width + column] = Colors[color]. ToArgb ();

  Y + = DeltaY;
 }
 x + = DeltaX;
}
Optimized code
for (int column = 1; column < this. Width; ++column)
{
 y = Ystart;
 int index = column;
 for (int row = 1; row < Height; row++)
 {
  double x1 = 0;
  Double y1 = 0;
  int dept = 0;
  Double X1sqr, Y1SQR;
  while (Dept < && ((X1SQR = x1 * x1) + (y1sqr = y1 * y1)) < 4)
  {
   dept++;
   Double temp = x1sqr-y1sqr + x;
   Y1 = 2 * X1 * y1 + y;
   x1 = temp;
  }
  Rgbvalues[index] = colors[((int) (dept * 2.55))]. ToArgb ();
  Index + = Width;

  Y + = DeltaY;
 }
 x + = DeltaX;
}  

After modification, re-analyze the application and check the performance of the optimized code. After you build and run the application, you can now redraw the irregular shape within 1-2 seconds. This significantly reduces the startup time of the application.

Visual Studio 2008 includes a new capability to compare two performance reports. To actually understand this feature, we rerun the application in the parser and capture the latest performance report. To see the differences between the two application versions, select the original report and the latest report in the Performance Explorer (Performance Explorer). Right-click the report and click the Compare performance Reports (compare performance report) option in the context menu. This command generates a new report that shows the difference between all functions and the "Exclusive Samples%" (exclusive sample%) value of the function in the two reports. As we cut the overall execution time, the relative percentage of Drawmandel rose from 31.76 to 70.46. To better see the actual optimization effect, change the columns in the Compare Options pane to "Inclusive Samples" (with samples) (see Figure 7). At the same time, the threshold is increased to 1500 samples to ignore minor fluctuations. In addition, you may have noticed that by default, the report displays negative numbers or shows the least optimized function first (because it is often used to find performance degradation). However, for optimization purposes, we will reverse-sort the Delta columns so that the most optimized functions can be seen at the top. Note that the number of samples for Drawmandel and its child functions changes from 2,064 to 175. More than 10 times times the optimization. To demonstrate the performance improvements achieved, you can copy and paste any part of the report.


Figure 7 Comparison of optimization results of Drawmandel

Objective analysis

Report Visualization Options

Visual Studio can view performance data in several ways using the various performance reporting options: Call Tree, Modules (module), Functions (function), and other options. The Summary (summary) view is displayed by default when you open the report. For example, to find the call path that produces most of the processing in Visual Studio 2008, select the Call tree view from the current View menu. (In Visual Studio 2005, select the Call Tree tab at the bottom of the report.) The Call Tree view contains an aggregation tree for all call stacks. The "Inclusive Samples%" (contains sample%) column shows the total cost of each branch in these code paths. Performance bottlenecks can be found along the most expensive branches.

in Visual Studio 2008, the parser team added two new features to simplify the use of performance reports. The first feature added is the noise reduction option. By default, the report now cuts out unimportant small functions, making it easy for users to see functions with greater impact. This option is often called clipping. In addition, the team reduces the depth of the call tree by putting together functions that do not process themselves and only call other functions for processing. The Visual Studio parser calls this a collapsed.

The noise reduction option in the performance report controls the threshold values for trimming and collapsing. If you are having trouble finding a specific function in the performance report, you can turn off the noise reduction option.

A second big improvement to the call tree in Visual Studio 2008 is the Hot Path button and the appropriate context menu. The hot path highlights the most expensive code path in your program and goes down that path until you see a significant processing of a single function that is performed (and not delegated). The hot path will then highlight the function. If there are two or more separate important code paths, the hot path stops where the branch appears in the tree. If hot path provides multiple branches for your application, you can select one of the most interesting and reapply the hot path to that particular branch.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.