Valgrind Study Notes (2)

Source: Internet
Author: User
Tags valgrind

Cachegrind:

Cachegrind collects cpu statistics when the application is running by simulating the Level 1, 3 cache of the cpu, and prints detailed data and summary information.

1. The following are some abbreviations of cpu statistics:
I cache reads (Ir, which equals the number of instructions executed), I1 cache read misses (I1mr) and LL cache instruction read misses (ILmr ).
D cache reads (Dr, which equals the number of memory reads), D1 cache read misses (D1mr), and LL cache data read misses (DLmr ).
D cache writes (Dw, which equals the number of memory writes), D1 cache write misses (D1mw), and LL cache data write misses (DLmw ).
Conditional branches executed (Bc) and conditional branches mispredicted (Bcm ).
Indirect branches executed (Bi) and indirect branches mispredicted (Bim ).

Note that D1 total accesses is given by D1mr + D1mw, and that LL total accesses is given by ILmr + DLmr + DLmw.

2. Execution method:
Valgrind -- tool = cachegrind your_application
The statistical information output by the program is as follows:
= 31751 = I refs: 27,742,716
==31751 = I1 misses: 276
==31751 = LLi misses: 275
==31751 = I1 miss rate: 0.0%
==31751 = LLi miss rate: 0.0%
= 31751 =
= 31751 = D refs: 15,430,290 (10,955,517 rd + 4,474,773 wr)
= 31751 = D1 misses: 41,185 (21,905 rd + 19,280 wr)
= 31751 = LLd misses: 23,085 (3,987 rd + 19,098 wr)
= 31751 = D1 miss rate: 0.2% (0.1% + 0.4%)
= 31751 = LLd miss rate: 0.1% (0.0% + 0.4%)
= 31751 =
= 31751 = LL misses: 23,360 (4,262 rd + 19,098 wr)
= 31751 = LL miss rate: 0.0% (0.0% + 0.4%)

The results of cachegrind will also output more details in the form of output files. The default file name of the output file is cachegrind. out. <pid>, where <pid> is the pid of the current process. You can use -- cachegrind-out-file to specify a readable file name. This file will become the input of cg_annotate.

3. cg_annotate:
Cg_annotate <filename>
The following is the output of statistics after cg_annotate is executed:
I1 cache: 65536 B, 64 B, 2-way associative
D1 cache: 65536 B, 64 B, 2-way associative
LL cache: 262144 B, 64 B, 8-way associative
Command: concord vg_to_ucode.c
Events recorded: Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw
Events shown: Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw
Event sort order: Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw
Threshold: 99%
Chosen for annotation:
Auto-annotation: off

The output of detailed information after cg_annotate execution (function by function) is as follows ):
--------------------------------------------------------------------------------
Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw file: function
--------------------------------------------------------------------------------
8,821,482 5 5 2,242,702 1,621 73 1,794,230 0 0 getc. c: _ IO_getc
5,222,023 4 4 2,276,334 16 12 875,959 1 1 concord. c: get_word
2,649,248 2 2 1,344,810 7,326 1,385... vg_main.c: strcmp
2,521,927 2 2 591,215 0 0 179,398 0 0 concord. c: hash
2,242,740 2 2 1,046,612 568 22 448,548 0 0 ctype. c: tolower
1,496,937 4 4 630,874 9,000 1,400 0 0 concord. c: insert
897,991 51 51 897,831 95 30 62 1 1 ??? :???
598,068 1 1 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile. c :__ flockfile
598,068 0 0 299,034 0 149,517 0 0 0 ../sysdeps/generic/lockfile. c :__ funlockfile
598,024 4 4 213,580 35 16 149,506 0 0 vg_clientmalloc.c: malloc
446,587 1 1 215,973 2,167 430 129,948 14,057 concord. c: add_existing
341,760 2 2 128,160 0 0 128,160 0 0 vg_clientmalloc.c: vg_trap_here_WRAPPER
320,782 4 4 150,711 276 0 56,027 53 53 concord. c: init_hash_table
298,998 1 1 106,785 0 0 64,071 1 1 concord. c: create
149,518 0 0 149,516 0 0 1 0 0 ??? : Tolower @ GLIBC_2.0
149,518 0 0 149,516 0 0 1 0 0 ??? : Fgetc @ GLIBC_2.0
95,983 4 4 38,031 0 0 34,409 3,152 3,150 concord. c: new_word_node
85,440 0 0 42,720 0 21,360 0 0 0 0 vg_clientmalloc.c: vg_bogus_epilogue

Note: In the preceding data, if the value of a column is dot, this event does not occur in this function. If the function name contains ??? :???, The file name cannot be determined from debug info. If the program does not have the-g option during compilation, there will be a lot of such unknown information.

4. line by line calculation:
Cg_annotate <filename> concord. c. The line-based statistics of concord. c are output, as follows:
--------------------------------------------------------------------------------
-- User-annotated source: concord. c
--------------------------------------------------------------------------------
Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw
...... Void init_hash_table (char * file_name, Word_Node * table [])
3 1 1... 1 0 0 {
... FILE * file_ptr;
... Word_Info * data;
1 0 0... 1 1 1 int line = 1, I;
.........
5 0 0... 3 0 data = (Word_Info *) create (sizeof (Word_Info ));
.........
4,991 0 0 1,995 0 998 0 0 0 for (I = 0; I <TABLE_SIZE; I ++)
3,988 1 1 1,994 0 0 997 53 52 table [I] = NULL;
.........
.../* Open file, check it .*/
6 0 0 1 0 0 4 0 file_ptr = fopen (file_name, "r ");
2 0 0 1 0 0... if (! (File_ptr )){
...... Fprintf (stderr, "Couldn't open '% s'. \ n", file_name );
1 1 1 ...... exit (EXIT_FAILURE );
.........}
.........
165,062 1 1 73,360 0 0 91,700 0 0 while (line = get_word (data, line, file_ptr ))! = EOF)
146,712 0 0 73,356 0 73,356 0 0 insert (data->; word, data-> line, table );
.........
4 0 0 1 0 0 2 0 0 free (data );
4 0 0 1 0 0 2 0 0 fclose (file_ptr );
3 0 0 2 0 0 ...}

5. cg_diff file1 file2
Used to compare the differences between two input files. This tool can be used to test the performance of a function, make some modifications, and then compare the differences before and after.

6. Cachegrind command line options:
-- Cache-sim = no | yes [yes]
Specify whether to collect cache accesses and miss counts

-- Branch-sim = no | yes [no]
Whether to collect branch instruction and misprediction counts

7. cg_annotate command line options:
-- Show = A, B, C [default: all, using order in cachegrind. out. <pid>]
Specify the events columns to be displayed, such as (-- show = D1mr, DLmr) or (-- show = DLmr, DLmw)

-- Sort = A, B, C [default: order in cachegrind. out. <pid>]
Specify the events to be sorted in the function-by-function details.

-- Threshold = X [default: 0.1%]
Filter the output data. Details that exceed this threshold will be queried.
Sets the threshold for the function-by-function summary. A function is shown if it accounts for more than X % of the counts for the primary sort event. if auto-annotating, also affects which files are annotated.
Note: thresholds can be set for more than one of the events by appending any events for the -- sort option with a colon and a number (no spaces, though ). e. g. if you want to see each function that covers more than 1% of LL read misses or 1% of LL write misses, use this option:
-- Sort = DLmr: 1, DLmw: 1

-- Auto = <no | yes> [default: no]
When enabled, automatically annotates every file that is mentioned in the function-by-function summary that can be found. Also gives a list of those that couldn't be found.

-- Context = N [default: 8]
Print N lines of context before and after each annotated line. Avoids printing large sections of source files that were not executed. Use a large number (e.g. 100000) to show all source lines.

-I <dir> -- include = <dir> [default: none]
Specify the search path of the source file. You can use multiple-I/-- include to specify more directories.

Callgrind:

1. Accurately diagnose some code snippets:
-- Instr-atstart = no: set this option to no when the program starts, so that the program will not collect the test information. When you are about to start measuring the code snippet you need to measure, execute the command callgrind_control-I on in another terminal window. If you want to complete precise measurement, you need to define the macro CALLGRIND_START_INSTRUMENTATION before the measurement code snippet, and then define CALLGRIND_STOP_INSTRUMENTATION.

2. Run callgrind_control to dump the statistics of the specified function:
-- Dump-before = function: dump statistics to the file before entering the function;
-- Dump-after = function: dump collects statistics to the file after leaving the function;
-- Zero-before = function: reset all counters with 0 before entering the function. Add the macro CALLGRIND_ZERO_STATS to the Code to reset the counter to 0 more accurately.
The preceding options can be used multiple times to specify multiple functions.

3. Callgrind -- cache-sim = yes by setting this option to yes, you can simulate the cache behavior to get more statistics about the cache.
Callgrind -- branch-sim = yes by setting this option to yes, you can get more performance problems caused by inefficient switch statements.
4. Callgrind command line options:
1) -- callgrind-out-file = <file>
Specify the output file of profile data, instead of the file generated by default naming rules.

2) -- dump-line = <no | yes> [default: yes]
The event count takes the source line as the statistics granularity, but requires the source program to add the-g option during compilation.

3) -- collect-policime = <no | yes> [default: no]
This specifies whether information for system call times shocould be collected.

5. callgrind_annotate command line options: (most of the options are the same as cg_annotate. The following two options are unique to callgrind_annotate)
1) -- random sive = <yes | no> [default: no]
When cost is calculated, the cost of callee is merged into the cost of caller.

2) -- tree = <none | caller | calling | both> [default: none]
Print for each function their callers, the called functions or both.

Helgrind:
1. -- track-lockorders = no | yes [default: yes]
Check whether the lock sequence is detected while the program is running. If you do not care about this problem for the time being, you can temporarily disable it.

2. -- read-var-info = yes
A detailed variable Declaration address is provided.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.