XPerf – Windows Performance Toolkit

來源:互聯網
上載者:User
XPerf - Windows Performance Toolkit

Event Tracing for Windows has been with us since Windows 2000.  It is an infrastructure for raising events from various system components, and has only been used by a small number of kernel-mode
entities.  In Windows XP, MOF files (familiar from WMI provider metadata) were used to describe events.  Finally, in Windows Vista and Windows Server 2008 events were described by XML manifests, an investment was made in popularizing ETW, and hundreds of new
event providers were added.

What kind of information is generated by all these providers?  Well, first of all, there's the
Windows Event Log which consumes

some of the information generated by ETW providers (but not all).  So we get all kind of diagnostic messages on things happening in the system.  Another provider is the Performance Monitor, which features the ability to query a collection set of ETW events. 
Integrating these various sources of information is not an easy task, especially if you are alternating between analyzing a system as a whole and analyzing a set of specific applications within the same trace.

It is this integration that has led to the birth of the
Windows Performance Toolkit.  It features the data collection and integration tools necessary to interpret and utilize ETW output correctly.  It is specifically constructed so that lots of information can be viewed in a coherent fashion, and so that a system-wide
image of what's going on can be obtained.  Additionally, through the use of the kernel sampling interrupt, a global sampling profiler is available (including call stack analysis).  And best of all?  It's completely free!

So let's take a look at some of the abilities of this new toolkit.  First of all, you need to install it from
WHDC.  Note that the installation is only
supported on Windows Vista SP1 and Windows Server 2008 (I got it to be almost fully functional without Vista SP1, but caveat emptor).  Data collection can be performed on a Windows XP SP2
or Windows Server 2003 system (you only need xperf.exe and perfctrl.dll for that), but the trace decoding can only be performed on NT 6.0 and higher.

Let's start with a sample global data collection.  (By the way, if you're not into my walkthroughs, the toolkit comes with an extensive set of documentation -
online and offline.  The starter document is 65 pages, which gives you an idea of how big this thing really is.)

I went to an administrative command line prompt on my Vista, navigated to the toolkit's installation directory (C:\Program Files\Microsoft Windows Performance Toolkit by default) and typed:

xperf -on Base

This enables a set of ETW providers to publish their events (the exact set of providers can be seen by typing
xperf -providers).  These events are then collected and written to temporary buffers.  Note that these temporary buffers might grow very large, so if you plan to perform data collection across a long period of time, you better have some free disk space
and lots of memory available on the box.

I then proceeded to write a few lines in this post, open a couple of programs, and then went back to the command line and typed:

xperf -d result.etl

This disables the selected ETW data collection and writes the raw results to the result.etl file (note that this command might take quite some time to complete).  These results are, well, raw, so they need to be analyzed soon.  We can accomplish this via
the command line by launching a set of actions on the output file, or using the GUI Performance Analyzer tool (xperfview) which is part of the toolkit.

An example of what you can accomplish on the command line can be demonstrated by typing the following action sequence:

xperf -i result.etl -o proclist.txt -a process

This generates a list of processes active during the trace into the proclist.txt file.  Here's some of the sample content from my box:

Start Time,   End Time, Process,            DataPtr,     Process Name ( PID),  ParentPID,  SessionID,          UniqueKey       MIN,        MAX, Process, 0X0000000000168EE0,             Idle (   0),          0,          0, 0x0000f80002150400       MIN,        MAX, Process, 0X00000000001693C0,           System (   4),          0,          0, 0x0000fa8001897040       MIN,        MAX, Process, 0X00000000001A17A0,      svchost.exe ( 356),        708,          0, 0x0000fa8004ce4040       MIN,        MAX, Process, 0X0000000000171080,         smss.exe ( 520),          4,          0, 0x0000fa800455f610       MIN,        MAX, Process, 0X00000000001712B0,        csrss.exe ( 616),        604,          0, 0x0000fa8004804c10       MIN,        MAX, Process, 0X00000000001904A0,      svchost.exe ( 620),        708,          0, 0x0000fa8004d2ac10

Another example of command line info generation would be:

xperf -i result.etl -o diskio.txt -a diskio

This gives you some disk I/O statistics for the duration of the trace.  Here's the sample from my box:

Start Time,  End Time,    Read,   Write, Usage %         0,   1000000,      18,      30,   51.37   1000000,   2000000,       0,      11,    2.44   2000000,   3000000,       0,       2,    1.05   3000000,   4000000,       1,       4,    4.26   4000000,   5000000,       1,      15,   20.17   5000000,   6000000,       0,       4,    1.26   6000000,   7000000,       0,       7,    2.19   7000000,   8000000,       1,      30,    2.02   8000000,   9000000,       9,       3,   13.47

Finally, if we're interested in a graphical interpretation of everything, just go ahead and launch the Performance Analyzer (xperfview) and then navigate to the trace file, or just type:

xperf result.etl

...from the command line.  Here's what the initial output looks like on my system:

That sure looks like a lot of useful data!  The first graph is CPU utilization by process, the second one is disk utilization by process.  You can filter some processes out, of course:

...and you can zoom in, select, make the graphs overlay so you can see the information side by side:

And then there's the summary table feature, if you just right click on the graph you're interested in and choose "Summary Table":

 

...and if you look at the Disk I/O summary and right click for a Detail Graph, you get the breakdown of specific I/O requests and can even visually see fragmented file access if present (!):

Note how you can see the volume on the right (C and D in my case), and if you hover on each individual dot you see the information on the process making the request and the actual file path.  On the left you see the disk location information (offset in bytes
from the disk start), so that if there are jumps across the disk all the time, we have some fragmented non-sequential access going on.

And then there's the page fault details, to the level of the file being requested, the size of the I/O request, the average time spent servicing the request, the total I/O time, etc.:

Bear in mind: we have seen nothing yet.  There are some preconfigured profiles for analyzing
standby/resumption, hibernation, boot, application startup, network activity, and many other scenarios.  You can get events generated on virtual memory allocation, power management events,
registry access, driver events, system calls, interrupts, DPCs/APCs, context switches, what not.  (Friendly reminder once more: there's documentation available both
online and offline, go and read it right away! :-))

However, there's another piece of coolness to see if you're still with me and interested in what else the tool is capable of.  Using the
kernel sample profile interrupt, the ETW system can capture the instruction pointer and the stack during trace execution.  This data is logged to an ETL file, and can then be analyzed to see what kind of call stacks
your application encountered.  (This also gives away the information on performance bottlenecks - i.e. where your application spends the most of its time.)

So what I did to demonstrate this was write a very simple application which displays prime numbers in a given range.  It's written in a very inefficient way so that it takes quite some time to execute.  To profile what's going on inside, I went to the administrative
command prompt again and typed:

xperf -on SysProf

Note for advanced users: "Base" would also have worked, because it also enables profiling.  If I'm not interested in collecting disk I/O and memory statistics, however, then I'm better off with "SysProf" (which resolves to PROC_THREAD+LOADER+PROFILE).  Without
PROC_THREAD and LOADER you won't get reliable results.

Then all I had to do is launch my application, let it run to termination, generate a results file as before (xperf -d ...) and launch the Performance Analyzer to see what we got.

The first and foremost thing that you have to do at this point is have
debugging symbols configured properly.  You could spend hours on trying to understand what's wrong with what you have done, only to discover it's due to symbols improperly configured.  To get symbols for Windows code lined up properly all you need to do
is set up the _NT_SYMBOL_PATH system environment variable (My Computer -> Properties -> Advanced System Settings -> Environment Variables) to the following string:

SRV*C:\Symbols*http://msdl.microsoft.com/download/symbols

You can replace C:\Symbols with any other downstream store on your disk that you prefer (or you could omit this section entirely).  What this tells the symbols engine is that whenever symbols for Windows are needed, they are automagically downloaded from
Microsoft and cached at the downstream store (C:\Symbols).  Yes, this requires a connection to the Internet.  Yes, the symbols will be re-downloaded if Windows updates are installed or a service pack is deployed.  Yes, if you're offline, you could also
download the entire cache and install it, but whenever something is updated you risk getting out of sync.

For your own code, you can prepend the path to the PDB files to the above string, for example:

set _NT_SYMBOL_PATH = C:\Code\MyApp\release;%NT_SYMBOL_PATH%''' OR '''set _NT_SYMBOL_PATH = C:\Code\MyApp\release;SRV*C:\Symbols*http://msdl.microsoft.com/download/symbols

When you finally launch the tool, before you go ahead and open any graphs, go to the Trace menu and click Load Symbols.  This might take some time now or later, depending on what you're trying to do.  Essentially, it might be downloading symbols for the
entire system for the first time, which could take several minutes or even more on a slow connection.  On my system, there were 142 modules for which symbols were downloaded, occupying a total of 93MB disk space.

After you've got symbols loaded, choose a region on the graph or the entire graph, right click and select "Summary Table".  Use the column chooser on the left to group by Process and Stack, and show you the Weight and the Weight%.  Note that you can do this
for any process, not just for your code - all you need is symbols (which you can have on debug and release builds nowadays with no problem at all).  For my prime numbers application on my system I get the following output:

So it seems that MyApp.exe!IsPrime took 38.26% of the weight in this profiling session.  printf, on the other hand, took 0.71% of the weight.

Note we can't deduce the execution time from this output; we can only see the
relative weight a function had (i.e. the number of samples where this function was on the stack compared to the total number of samples taken by the profiler).

After optimizing the code a little bit (so that prime numbers are calculated more efficiently) and running it again under the profiler, here are the results:

The code was so optimized that there are no samples for the IsPrime function anymore (it was not inlined - we're talking about a debug build), and printf is responsible for 2.92% of the weight.

Yes, there is some room for improvement here because commercial tools like Microsoft's own
Visual Studio Profiler give you so much more, including the ability to compare performance reports, analyze managed code, use a convenient API to determine when profiling should
start and when it should end, profile memory allocations etc. - but this one is for free and it's part of an integrated suite of lots and lots of additional functionality.  Additionally, if you look closely into the output, you can see that the grouping performed
by this profiler can be done by the stack trace and not the function - so you can see how many times your function or sequence of functions was called in a particular order and manner.  This is something even commercial profilers make a difficult
objective to achieve.

Just one final note before you go ahead and try it: stack profiling works only on the 32-bit editions of Windows Vista or Windows Server 2008.  To be more accurate, I couldn't figure out how to make it work on a 64-bit system and gave up (and the following

blog post gives me some hope that one day we'll figure it out).  All I could get on a 64-bit system is just a list of functions getting called with their sample times and counts (which is great), but not the detailed elegant call stacks you can get on a
32-bit machine.

So get your hands on the
Windows Performance Toolkit and try it out!  Let me know what you discovered.

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.