PDF download
What is this GWP (Google wide prifile)? It try to give answer to the following question:
What are the hottest processes, routines, or code regions?
How does performance differ extends SS software versions?
Which locks are most contended?
Which processes are memory hogs?
Does a participant Memory Allocation Scheme benefit a participant class of applications?
What is the cycles per Instruction (CPI) for applications platform SS platforms?
From paper, we can get some notes:
1. Data Center monitor is different data center profiling tools, monitor is coarse degree, and profiling tools is fine grained.
2. sample at two demension: Select part of machines and sample selected events
3. Result shows when sampling rate reach 0.01, we can get what we wanted, so higher sampling is not necessary.
4. Every machine has a lightweight daemon to collect machine and event info
5. It seems more than one kind of OS were installed in one datacenter, the reason maybe different app can get higher performance on some special kind of OS.
6. profling lib act as a HTTP server with hanlders for each type of profiler so what developer only need to do is link the profiling lib and remote client can get prifiling info from that service.
7. It seems symbol of online App process was stripped, but I think we can stripped job process but keep service process symbol.
8. Web interface can provide:
- Qeury any part of data
- Call Graph and execution time
- Source code align
9. The scale of GWP storage cluster will limit sample rate
10. Some interesting observation:
- zlib consume 5% CPU, so we can concentrate on Zip optimization
- online coverage meaurement
- Job scheduler optimization, some kind of job instance perform better on some spedical CPU architecture. arious
- collect includemension, such as function name, compile, shared lib version ,... to support group aggregation to prifile performance of different unit, such as date center, compiler