Go: One of the Linux performance evaluation tools: GPROF

Source: Internet
Author: User

1 Introduction

Improving the performance of your application is a time-consuming effort, but it is often not obvious what functions in the program consume most of the execution time. The GNU Compiler Toolkit provides a profiling tool for the GNU Profiler (gprof). Gprof can accurately analyze performance bottlenecks for programs on the Linux platform. Gprof accurately gives the time and number of times the function is called and gives the function call relationship.

Gprof User manual website http://sourceware.org/binutils/docs-2.17/gprof/index.html

2 Features

Gprof is one of the GNU GNU Binutils Tools, and by default the Linux system has this tool.

1. "Flat profile" can be displayed, including the number of calls per function, the processor time consumed by each function,

2. You can display "call graph", including the function's calling relationship, how much time each function call took.

3. The source code of the note can be displayed-a copy of the program's source codes, marking the number of executions of each line of code in the program.

3 principle

By compiling and linking the program (using the-PG compile and link options), GCC adds a function called Mcount (or "_mcount", or "__mcount", depending on the compiler or the operating system) in each function of your application, which means that your application Each function calls Mcount, and Mcount saves a function call graph in memory and finds the address of the child function and parent function in the form of the function call stack. This call graph also holds all the information about the call time, the number of calls, and so on for the function.

4 Use Flow

1. Add the-PG option when compiling and linking. In general we can add in makefile.

2. Execute the compiled binaries. The execution parameters and methods are the same as before.

3. Generate the Gmon.out file under the program run directory. If the original Gmon.out file, it will be rewritten.

4. End the process. Then Gmon.out will be refreshed again.

5. Analyze the Gmon.out file using the Gprof tool.

5 parameter description

L-b no longer prints a detailed description of each field in the statistics chart.

The l-p only outputs the call graph of the function (the part of the message called graph).

L-q only outputs the time consuming list of functions.

L-E name no longer outputs the call graph of the function name and its child functions (unless they have other parent functions that are not restricted). Multiple-e flags can be given. An-e flag can specify only one function.

L-E name no longer outputs the call graph for the function name and its child functions, which resembles the-e flag, but it excludes the time spent by the function name and its child functions in the calculation of total time and percent time.

The L-f name output function name and its child function call graph. Multiple-F flags can be specified. An-f flag can specify only one function.

The L-f name output function, called the call graph for name and its child functions, is similar to the-f flag, but it uses only the time of the printed routine in the total time and percent time calculations. Multiple-F flags can be specified. An-f flag can specify only one function. The-f flag overrides the-e flag.

L-z shows routines with a zero number of uses (calculated by call count and cumulative time).

General usage: gprof–b binary program Gmon.out >report.txt

6 Description of the report

Gprof generated by the information explained:

%time

Cumulative

seconds

self

Seconds

calls

self

ts/call

total

ts/call

name

The function consumes time as a percentage of the program's total time

Cumulative execution time of the program

(includes only functions that gprof can monitor)

The execution time of the function itself

(Total time of all called times)

Number of times the function was called

Average function Execution time

(not including called time)

(Single execution time of the function)

Average function Execution time

(including called time)

(Single execution time of the function)

Name of function

The field meaning of call Graph:

Index

%time

Self

Children

Called

Name

Index value

Function consumption time% of all time

function itself execution time

Time taken to execute the child function

Number of Calls

Name of function

Attention:

The cumulative execution time of a program only includes functions that gprof can monitor. Functions that work in kernel State and third-party library functions without-PG compilation cannot be monitored by gprof (such as sleep (), etc.)

The specific parameters of Gprof can be queried by man Gprof.

7 Support for shared libraries

Support for code profiling is added by the compiler, so if you want to get profiling information from a shared library, you need to use-PG to compile the libraries. Provides a C library version (LIBC_P.A) that has been compiled with code profiling support enabled.

If you need to parse system functions such as the LIBC library, you can replace-LC with –lc_p. This program will link libc_p.so or LIBC_P.A. This is important because it is the only way to monitor the execution time of the underlying C library functions (such as memcpy (), memset (), sprintf (), etc.).

GCC example1.c–pg-lc_p-o example1

Be careful with LDD./example | grep libc to see if the program is linked by libc.so or libc_p.so

8 User Time vs. kernel time

Gprof's biggest flaw: it can only analyze user time consumed by the application while it is running, and cannot get the running time of the program kernel space. Typically, the application runs at run time to run user code, and it takes some time to run "system code", such as a kernel system call to sleep ().

There is a way to see the runtime composition of an application, executing the program under the time command. This command shows the actual run time of an application, user space run time, kernel space run time.

such as time./program

Output:

Real 2m30.295s

User 0m0.000s

SYS 0m0.004s

9 Precautions

1. g++ in compiling and linking two procedures, use the-PG option.

2. You can only use a static connection to the LIBC library, otherwise calling the profile code before initializing *.so causes "segmentation fault", and the workaround is to add-STATIC-LIBGCC or-static at compile time.

3. If you use the LD Direct link program without g++, add the link file/lib/gcrt0.o, such as Ld-o myprog/lib/gcrt0.o MYPROG.O utils.o-lc_p. It could be gcrt1.o.

4. To monitor the execution time of third-party library functions, third-party libraries must also be compiled with the Add –PG option.

5. Gprof can only analyze user time consumed by the application.

6. The program cannot run in Demon mode. Otherwise the acquisition time is not. (number of calls to be collected)

7. It is a good practice to first use time to run the program to determine whether gprof can produce useful information.

8. If gprof is not suitable for your profiling needs, there are other tools that can overcome some of the gprof flaws, including OProfile and Sysprof.

9. Gprof is especially useful for CPU-intensive programs where the code is mostly user space. Programs that run very slowly for most of the time in kernel space or because of external factors, such as an overload of the I/O subsystem of the operating system, are difficult to optimize.

GPROF does not support multi-threaded applications, only the main thread performance data can be collected under multithreading. The reason is that the gprof uses the itimer_prof signal, only the main thread in the multi-thread can respond to the signal. But there is an easy way to solve this problem: http://sam.zoy.org/writings/programming/gprof.html

Gprof the report (Gmon.out) can only be generated after the program has finished exiting gracefully.

A) Cause: Gprof generates the result information by registering a function in the atexit (), and any non-normal exit does not perform the atexit () action, so no Gmon.out file is generated.

b) The program can exit normally from the main function, or exit by calling the exit () function from the system.

10 Multi-threaded applications

GPROF does not support multi-threaded applications, only the main thread performance data can be collected under multithreading. The reason is that the gprof uses the itimer_prof signal, only the main thread in the multi-thread can respond to the signal.

What is the way to analyze all threads? The key is to enable each thread to respond to itimer_prof signals. Can be implemented by a stake function, overriding the Pthread_create function.

gprof-helper.c////////////////////////////

#define _gnu_source

#include <sys/time.h>

#include <stdio.h>

#include <stdlib.h>

#include <dlfcn.h>

#include <pthread.h>

static void * Wrapper_routine (void *);

/* Original pthread function */

static int (*pthread_create_orig) (pthread_t *__restrict,

__const pthread_attr_t *__restrict,

void * (*) (void *),

void *__restrict) = NULL;

/* Library initialization function */

void Wooinit (void) __attribute__ ((constructor));

void Wooinit (void)

{

Pthread_create_orig = Dlsym (Rtld_next, "pthread_create");

fprintf (stderr, "pthreads:using Profiling hooks for gprof/n");

if (Pthread_create_orig = = NULL)

{

Char *error = Dlerror ();

if (Error = = NULL)

{

Error = "Pthread_create is NULL";

}

fprintf (stderr, "%s/n", error);

Exit (Exit_failure);

}

}

/* Our data structure passed to the wrapper */

typedef struct WRAPPER_S

{

void * (*start_routine) (void *);

void * ARG;

pthread_mutex_t lock;

pthread_cond_t wait;

struct Itimerval itimer;

} wrapper_t;

/* The wrapper function in charge for setting the Itimer value */

static void * Wrapper_routine (void * data)

{

/* Put user data in thread-local variables */

void * (*start_routine) (void *) = ((wrapper_t*) data)->;start_routine;

void * arg = ((wrapper_t*) data)->;arg;

/* Set the profile timer value */

Setitimer (Itimer_prof, & (wrapper_t*) data)->;itimer, NULL);

/* Tell the calling thread, we don ' t need its data anymore */

Pthread_mutex_lock (& (wrapper_t*) data)->;lock);

Pthread_cond_signal (& (wrapper_t*) data)->;wait);

Pthread_mutex_unlock (& (wrapper_t*) data)->;lock);

/* Call the REAL function */

return Start_routine (ARG);

}

/* Our wrapper function for the real pthread_create () */

int pthread_create (pthread_t *__restrict thread,

__const pthread_attr_t *__restrict attr,

void * (*start_routine) (void *),

void *__restrict Arg)

{

wrapper_t Wrapper_data;

int I_return;

/* Initialize the wrapper structure */

Wrapper_data.start_routine = Start_routine;

Wrapper_data.arg = arg;

Getitimer (Itimer_prof, &wrapper_data.itimer);

Pthread_cond_init (&wrapper_data.wait, NULL);

Pthread_mutex_init (&wrapper_data.lock, NULL);

Pthread_mutex_lock (&wrapper_data.lock);

/* The real pthread_create call * *

I_return = Pthread_create_orig (thread,

attr

&wrapper_routine,

&wrapper_data);

/* If The thread is successfully spawned, wait for the data

* To be released */

if (I_return = = 0)

{

Pthread_cond_wait (&wrapper_data.wait, &wrapper_data.lock);

}

Pthread_mutex_unlock (&wrapper_data.lock);

Pthread_mutex_destroy (&wrapper_data.lock);

Pthread_cond_destroy (&wrapper_data.wait);

return i_return;

}

///////////////////

Then compile it into a dynamic library gcc-shared-fpic gprof-helper.c-o GPROF-HELPER.SO-LPTHREAD-LDL

Examples of Use:

a.c/////////////////////////////

#include <stdio.h>;

#include <stdlib.h>;

#include <unistd.h>;

#include <pthread.h>;

#include <string.h>;

void Fun1 ();

void Fun2 ();

void* Fun (void * argv);

int main ()

{

int i = 0;

int id;

pthread_t thread[100];

for (i =0;i<; i++)

{

id = pthread_create (&thread[i], NULL, fun, NULL);

printf ("Thread =%d/n", i);

}

printf ("dsfsd/n");

return 0;

}

void* Fun (void * argv)

{

Fun1 ();

Fun2 ();

return NULL;

}

void Fun1 ()

{

int i = 0;

while (i<100)

{

i++;

printf ("fun1/n");

}

}

void Fun2 ()

{

int i = 0;

int b;

while (I<50)

{

i++;

printf ("fun2/n");

B+=i;

}

}

///////////////

GCC-PG A.C gprof-helper.so

To run the program:

./a.out

Analysis Gmon.out:

Gprof-b a.out Gmon.out

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.