Use valgrind to discover memory problems in Linux programs

Source: Internet
Author: User
Tags valgrind

Use valgrind to discover memory problems in Linux programs

Back to Top

Valgrind Overview

Architecture

Valgrind is a set of simulation debugging tools for open source code (GPL V2) in Linux. Valgrind consists of core and other kernel-based debugging tools. The kernel is similar to a framework. It simulates a CPU environment and provides services to other tools. Other tools are similar to plug-ins (plug-in ), use the services provided by the kernel to complete various memory debugging tasks. The architecture of valgrind is shown in:

Figure 1 valgrind Architecture

Valgrind includes the following tools:

  1. Memcheck.This is the most widely used tool of valgrind. A heavyweight memory Checker can find the vast majority of memory usage errors during development, such as using uninitialized memory and released memory, memory Access out of bounds. This is also the focus of this article.
  2. Callgrind. It is mainly used to check the problems encountered during function calls in the program.
  3. Cachegrind. It is mainly used to check the Cache Usage problems in the program.
  4. Helgrind. It is mainly used to check competition problems in multi-threaded programs.
  5. Massif. It is mainly used to check the problems that occur in the use of stacks in the program.
  6. Extension.You can use the functions provided by core to compile specific memory debugging tools.

Linux program memory space layout

To discover memory problems in Linux, you must first know how the memory is allocated in Linux? Shows a typical Linux C program memory space layout:

Figure 2: Typical memory space layout

A typical Linux C program memory space consists of the following parts:

  • Code segment (. text ).The command to be executed by the CPU is stored here. Code segments can be shared. Only one copy of the same code in the memory is available, and this segment is read-only to prevent programs from modifying their own commands due to errors.
  • Initialize the data segment (. Data ).It stores variables that need to be explicitly assigned the initial value in the program. For example, the global variable located outside all functions: int val = 100. It should be emphasized that the above two segments are located in the executable files of the program, and the kernel reads them from the source program file when calling the exec function to start the program.
  • Uninitialized data segment (. BSS). Data located in this section, the kernel initializes it to 0 or null before executing the program. For example, the global variable that appears outside of any function: int sum;
  • Heap ).This section is used to apply for dynamic memory in the program. For example, if you use malloc frequently, the new series functions apply for memory from this section.
  • Stack ).The local variables in the function and the temporary variables generated during the function call are saved in this section.

Memory check Principle

MemcheckShows how memory problems are detected:

Figure 3 memory check Principle

Memcheck can detect memory problems. The key is that it creates two Global tables.

  1. Valid-value table:

Each byte in the whole address space of the process has eight bits corresponding to it. Each register of the CPU also has a bit vector corresponding to it. These bits are responsible for recording whether the byte or register value has a valid and initialized value.

  1. Valid-Address Table

For each byte in the process's entire address space, there is also a bit corresponding to it, which records whether the address can be read and written.

Detection principle:

  • When you want to read and write a byte in the memory, first check the bit corresponding to this byte. If this a bit shows that this location is invalid, memcheck reports a read/write error.
  • The core is similar to a virtual CPU environment, so that when a byte in the memory is loaded into the real CPU, the V bit corresponding to this byte is also loaded into the virtual CPU environment. Once the value in the register is used to generate a memory address, or this value can affect program output, memcheck checks the corresponding v bits. If the value has not been initialized, the uninitialized memory usage error is reported.

Back to Top

Use valgrind

Step 1: Prepare the program

To make the errors detected by valgrind more accurate, for example, to locate the source code line, we recommend that you add the-G parameter during compilation. Select O0 as the compilation optimization option, although this will reduce the execution efficiency of the program.

The name of the sample program used here is sample. C (as shown below), and the selected compiler is GCC.

Generate executable programsGcc-g-O0 sample. C-o Sample

Listing 1

Step 2: run the executable program under valgrind.

Valgrind is used to debug memory problems. You do not need to re-compile the source program. Its input is a binary executable program. The common format for calling valgrind is:Valgrind [valgrind-options] Your-Prog [Your-prog-options]

Valgrind parameters are divided into two types: core parameters, which are applicable to all tools, and memcheck parameters. Valgrind's default tool is memcheck. You can also use "-- tool =Tool NameOther tools. Valgrind provides a large number of parameters to meet your specific debugging needs. For more information, see the user manual.

In this example, memcheck is used, so you can enter the following command:Valgrind<Path>/Sample.

Step 3: analyze the output information of valgrind.

The following is the output after running the preceding command.

Listing 2

  • The number (32372) similar to the row number on the left indicates the process ID.
  • The above red box indicates the version information of valgrind.
  • The red box in the middle indicates memory problems found by valgrind by running the tested program. By reading this information, we can find that:
    1. This is an illegal write operation on the memory, and the memory for the illegal write operation is 4 bytes.
    2. The function stack when an error occurs, and the specific source code line number.
    3. The specific address space for an invalid write operation.
  • The bottom red box summarizes the memory problems and memory leaks. The memory leak size (40 bytes) can also be detected.

The example program obviously has two problems. One is that the heap memory dynamically applied in the fun function is not released; the other is that the access to the heap memory is out of bounds. Valgrind found these two problems.

Back to Top

Use memcheck to discover common memory problems

When developing applications on the Linux platform, the most common problem is the memory usage error. We have summarized the common memory usage error and explained how to use valgrind to detect it.

Use uninitialized memory

Problem Analysis:

For variables in different segments in the program, their initial values are different. The initial values of global variables and static variables are 0. For local variables and dynamically applied variables, their initial values are random values. If the program uses a variable with a random value, the program's behavior becomes unpredictable.

The following program is a common scenario where uninitialized variables are used. Array A is a local variable and its initial value is a random value. during initialization, it does not initialize all its array members. In this way, memory problems may occur when this array is used.

Listing 3

Result Analysis:

Assume that the file name is:Badloop. c, The generated executable program isBadloop. Use memcheck to test it. The output is as follows.

Listing 4

The output shows that in Row 3 of the program, the program jump depends on an uninitialized variable. The problems in the above programs are precisely discovered.

Memory read/write out of bounds

Problem Analysis:

This refers to the access to the memory address space that you should not/do not have the permission to access, such as out-of-bounds access to the array; dynamic memory access exceeds the size range of the applied memory. The following program is a typical array out-of-bounds problem. PT is a local array variable with a size of 4. P initially points to the starting address of the PT array. However, after P loops are superimposed, P exceeds the range of the PT array, if you write P again at this time, the consequences will be unpredictable.

Listing 5

Result Analysis:

Assume that the file name is badacc. cpp and the executable program generated is badacc. Use memcheck to test the program and output the following code.

Listing 6

The output results show that the write operation is invalid in the second row of the program, and the read operation is invalid in the second row. The above problems are found accurately.

Memory coverage

Problem Analysis:

The powerful and terrible aspect of the C language is that it can directly operate the memory. The C standard library provides a large number of such functions, such as strcpy, strncpy, memcpy, and strcat, one common feature of these functions is to set the source address (SRC), which cannot overlap with the destination address (DST), SRC, or DST. Otherwise, the results will be unpredictable.

The following is an example of overlapping SRC and DST. In lines 15 and 17, Src and DST point to 20 different addresses, but the specified copy length is 21, which overwrites the previous copy value. Similar to the 24th-line program, the difference between SRC (x + 20) and DST (X) is 20, but the DST length is 21, which will overwrite the memory.

Listing 7

Result Analysis:

Assume that the file name is badlap. cpp, And the executable program generated is badlap. Use memcheck to test the program, and the output is as follows.

Listing 8

The output result shows rows 15, 17, and 24 in the preceding program. The source address and target address settings overlap. The above problems were found accurately.

Dynamic Memory Management Error

Problem Analysis:

There are three common memory allocation methods: static storage, stack allocation, and stack allocation. Global variables belong to static storage. They are allocated storage space during compilation, and local variables in the function belong to stack allocation. The most flexible memory usage is allocated on the stack, it is also called dynamic memory allocation. Common memory dynamic allocation functions include malloc, alloc, realloc, and new. Dynamic Release functions include free and delete.

Once dynamic memory is successfully applied for, we need to manage the memory on our own, which is the easiest way to make mistakes. The following program contains common errors in dynamic memory management.

Listing 9

Common memory dynamic management errors include:

    • Inconsistent application and release

    C ++ is compatible with C, and C ++ have different memory application and release functions. Therefore, C ++ programs have two sets of dynamic memory management functions. A constant rule is that the memory applied in C mode is released in C mode, and the memory applied in C ++ mode is released in C ++ mode. That is, the memory applied for in malloc/alloc/realloc mode is released in free mode, and the memory applied for in new mode is released in Delete mode. In the above program, the memory is applied for in malloc mode but deleted for release. Although this is not a problem in many cases, it is definitely a potential problem.

      • Application and release do not match

      How much memory is applied for and how much memory will be released after use. If it is not released, or if it is released less, the memory will be leaked. If it is released more, it will also cause problems. In the above program, the pointer P and PT point to the same memory, but are released twice.

        • Read/write after release

        Essentially, the system will maintain a dynamic memory linked list on the stack. If it is released, it means that the block of memory can continue to be allocated to other parts. If the memory is released and then accessed, it is possible to overwrite the information of other parts. This is a serious error. The above program will still write this memory after it is released in line 1.

        Result Analysis:

        Assume that the file name is badmac. cpp, And the executable program generated is badmac. Use memcheck to test the program. The output is as follows.

        Listing 10

        The output results show that the allocation and release functions of Row 3 are inconsistent. The write operation of Row 3 is invalid, that is, the write value to the released memory address. The memory function of Row 3 is invalid. These three problems are precisely discovered.

        Memory leakage

        Problem description:

        Memory leakage (Memory Leak) refers to the memory dynamically applied in the program, which is not released after use and cannot be accessed by other parts of the program. Memory leakage is the most troublesome problem in developing large programs, so some people say that memory leakage is unavoidable. In fact, to prevent memory leaks, we should start with good programming habits. Another important point is to strengthen unit test, and memcheck is such an excellent tool.

        The following is a typical case of Memory leakage. The main function calls the MK function to generate the tree node, but after the call is complete, there is no corresponding function: nodefr releases the memory, so that the tree structure in the memory cannot be accessed by other parts, this results in Memory leakage.

        In a single function, everyone has a strong awareness of Memory leakage. But in many cases, we will pack malloc/free or new/Delete to meet our specific needs and cannot use or release a function. This example also illustrates the most common cause of Memory leakage: The Interface part of the two parts, the memory applied for by a function, and the memory released by a function. In addition, these functions are developed and used by different people, which makes Memory leakage more likely. This requires a good unit test habit to eliminate memory leaks in the initial stage.

        Listing 11

        Configuration 11.2

        Configuration 11.3

        Result Analysis:

        Assume that the generated executable program named tree. H, tree. cpp, and badleak. cpp is badleak, and memcheck is used to test the program. The output is as follows.

        List 12

        The sample program is used to generate a tree. The size of each tree node is 12 (considering the memory alignment). There are 8 nodes in total. From the above output, we can see that all memory leaks are discovered. Memcheck classifies Memory leakage into two types: possibly lost and definitely lost ). Possibly lost indicates that a pointer can still access a block of memory, but it is not the first address of the memory. Definitely lost indicates that the memory cannot be accessed. Definitely lost is divided into two types: direct and indirect ). The direct and indirect difference is that there is no direct pointer to the memory, and the indirect pointer to the memory is located in the memory leak. In the preceding example, the root node is directly.
        And other nodes are indirectly lost.

        Back to Top

        Summary

        This article introduces the architecture of valgrind and focuses on its most widely used tool: memcheck. This article describes the basic principle and usage of memcheck to discover memory problems, and how memcheck can be used to discover the most widely used five categories of memory problems in current development. Discovering memory problems as early as possible in the project can greatly improve development efficiency. valgrind is an outstanding tool that can help you achieve this goal.

        References

        • Valgrind Official Website: http://www.valgrind.org/

        • Memory debugging skills:
          Http://www.ibm.com/developerworks/cn/aix/library/au-memorytechniques.html

        • How to detect memory leakage in Linux:

          Http://www.ibm.com/developerworks/cn/linux/l-mleak/

        Related Article

        Contact Us

        The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

        If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

        A Free Trial That Lets You Build Big!

        Start building with 50+ products and up to 12 months usage for Elastic Compute Service

        • Sales Support

          1 on 1 presale consultation

        • After-Sales Support

          24/7 Technical Support 6 Free Tickets per Quarter Faster Response

        • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.