Applying Valgrind to discover memory problems with Linux programs

Source: Internet
Author: User
Tags valgrind

How to locate memory problems in application development has been a bottleneck in inux application development. There is a very good Linux under the open-source memory problem detection Tool: Valgrind, can greatly help you solve the above problems. Mastering the use and working principle of valgrind can effectively locate and avoid memory problems in application development.

5 Reviews:

Yang Jing ([email protected]), software engineer, IBM

November 27, 2008

    • Content
Applying Valgrind to discover memory problems with Linux programs

Back to top of page

Valgrind Overview Architecture

Valgrind is a set of simulation debugging Tools for Linux, open source (GPL V2). The valgrind consists of the kernel and other kernel-based debugging tools. The kernel is similar to a framework (framework) that simulates a CPU environment and provides services to other tools, while other tools are similar to plug-ins (plug-in) that use the services provided by the kernel to perform a variety of specific memory debugging tasks. The architecture of the Valgrind is as follows:

Figure 1 Valgrind Architecture

Valgrind includes some of the following tools:

    1. Memcheck. This is the most widely used tool in Valgrind, a heavyweight memory checker that discovers most memory errors in development, such as using uninitialized memory, using freed memory, memory access, and so on. This is also the part that this article will focus on.
    2. Callgrind. It is primarily used to check for problems that occur during function calls in the program.
    3. Cachegrind. It is primarily used to check for problems with cache usage in the program.
    4. Helgrind. It is primarily used to check for competition issues that occur in multithreaded programs.
    5. Massif. It is primarily used to check for problems that occur in the stack usage in the program.
    6. Extension. You can use the functionality provided by the core to write your own specific memory debugging tools.
Linux Program Memory Space layout

To discover the memory problems under Linux, it is important to know how memory is allocated under Linux. Shows a typical Linux C program memory space layout:

Figure 2: Typical memory space layout

A typical Linux C program memory space consists of the following parts:

    • code snippet (. Text). This is where the CPU is going to execute the instructions. Code snippets are shareable, the same code has only one copy in memory, and this segment is read-only, preventing the program from modifying its own instructions due to errors.
    • initializes the data segment (. data). here is a variable that needs to be explicitly assigned an initial value in the program, such as a global variable that is outside all functions: int val=100. It should be emphasized that the above two paragraphs are in the program's executable file, and the kernel reads from the source program file when calling the Exec function to start the program.
    • Uninitialized data segment (. BSS). The data in this section is initialized to 0 or null before the kernel executes the program. For example, a global variable that appears outside of any function: int sum;
    • Heap. This section is used to make a dynamic memory request in a program, such as a frequently used Malloc,new series function to request memory from this segment.
    • stack (stack). The local variables in the function and the temporary variables that are produced during the function call are saved in this paragraph.
Memory Check principle

Memcheck The principle of detecting memory problems is as follows:

Figure 3 Principle of memory checking

Memcheck can detect memory problems, the key is that it has established two global tables.

    1. Valid-value table:

For each byte in the entire address space of the process, there are 8 bits corresponding to it, and there is a bit vector corresponding to each register of the CPU. These bits are responsible for recording the byte or whether the register value has a valid, initialized value.

    1. Valid-address table

For each byte in the process's entire address space (byte), there are 1 bits corresponding to it, which is responsible for recording whether the address can be read or written.

Detection principle:

    • When you want to read and write a byte in memory, first check the byte of a bit. If the a bit shows that the location is invalid, Memcheck reports a read-write error.
    • The kernel (core) is similar to a virtual CPU environment, so that when a byte in memory is loaded into the real CPU, the corresponding V bit of that byte is also loaded into the virtual CPU environment. Once the value in the register is used to generate the memory address, or the value can affect the program output, Memcheck checks the corresponding v bits, and if the value has not been initialized, it reports the use of an uninitialized memory error.

Back to top of page

Valgrind use

The first step: Prepare the program

In order to make the errors found by valgrind more accurate, such as being able to locate the source line, it is recommended to add the-G parameter at compile time, and compile the optimization option, select O0, although this will reduce the execution efficiency of the program.

The sample program file used here is named: sample.c (shown below), and the compiler selected is GCC.

Generate executable program gcc–g–o0 sample.c–o Sample

Listing 1

Step two: Under Valgrind, run the executable program.

Debug memory problems with Valgrind, do not need to recompile the source program, its input is the binary executable program. The general format for calling Valgrind is:valgrind [valgrind-options] Your-prog [your-prog-options]

The parameters of Valgrind are divided into two categories, one is the core parameter, it applies to all tools, and the other is the parameters of a particular tool such as Memcheck. Valgrind The default tool is Memcheck, or you can specify other tools by using "--tool= toolname". Valgrind provides a wide range of parameters to meet your specific debugging needs, depending on the user manual.

This example will use Memcheck, so you can enter the command into the following:valgrind <path>/Sample.

Step three: Analyze the output information of the Valgrind.

The following is the output after running the above command.

Listing 2

    • A number (32372) showing a similar line number on the left represents the Process ID.
    • The top red box indicates the version information of the Valgrind.
    • The red box in the middle indicates the memory problem that Valgrind found by running the program being tested. By reading this information, you can find:
      1. This is a memory illegal write operation, the memory of illegal write operation is 4 bytes.
      2. The function stack at the time of the error, and the specific source code line number.
      3. The specific address space of the illegal write operation.
    • The bottom red box is a summary of the discovered memory problems and memory leaks. The size of the memory leak (bytes) can also be detected.

The example program obviously has two problems, one is that the dynamic request heap memory in the fun function is not released, and the other is the access to heap memory is out of bounds. Both of these problems were discovered by Valgrind.

Back to top of page

Using Memcheck to find common memory problems

The most common problem when developing applications on the Linux platform is the wrong use of memory, and we summarize the usual memory error usage and explain how to detect it with Valgrind.

Use of uninitialized memory

Problem Analysis:

For variables that are in different segments of the program, their initial values are different, and the initial values for global and static variables are 0, while local variables and dynamically requested variables have their initial values as random values. If a program uses a variable that is random, the behavior of the program becomes unpredictable.

The following program is a common scenario in which uninitialized variables are used. Array A is a local variable whose initial value is a random value, and the initialization does not initialize all its array members, so there is a potential memory problem when using the array next.

Listing 3

Results Analysis:

Suppose this file is named:badloop.c, and the generated executable is Badloop. Test it with Memcheck, and the output is as follows.

Listing 4

The output shows that in line 11th of the program, the program's jump relies on an uninitialized variable. The problems in the above-mentioned procedure are found accurately.

Memory read/write out of bounds

Problem Analysis:

This situation refers to access to the memory address space that you should not/do not have access to, such as out of bounds when accessing an array, and exceeding the requested memory size range for dynamic memory access. The following program is a typical array out-of-bounds problem. PT is a local array variable with a size of 4,p initially pointing to the starting address of the PT array, but after the P loop is superimposed, p exceeds the range of the PT array, and if you write to p at this point, the consequences will not be expected.

Listing 5

Results Analysis:

Assuming this file is named Badacc.cpp, the resulting executable program is BADACC, tested with Memcheck, and output as follows.

Listing 6

The output shows an illegal write operation on line 15th of the program, and an illegal read operation on line 16th. Accurately identified the above problems.

Memory overwrite

Problem Analysis:

C language is powerful and scary is that it can directly manipulate memory, C standard library provides a large number of such functions, such as strcpy, strncpy, memcpy, strcat, etc., these functions have a common feature is the need to set the source address (SRC), and the destination address (DST), The addresses that SRC and DST point to cannot overlap, otherwise the results will not be expected.

Here is an example of a src and DST overlap. In lines 15 and 17, SRC and DST point to an address that differs by 20, but the specified copy length is 21, which overwrites the previous copy value. The 24th line of the program is similar, the SRC (x+20) and DST (x) point to the address of 20, but the length of DST is 21, which will also occur memory overwrite.

Listing 7

Results Analysis:

Assuming this file is named Badlap.cpp, the resulting executable program is BADLAP, tested with Memcheck, and output as follows.

Listing 8

The output shows the 15,17,24 line in the above program, and the source address and destination address settings overlap. The above problems were found accurately.

Dynamic memory Management Errors

Problem Analysis:

Common memory allocation methods are divided into three kinds: static storage, stack allocation, heap allocation. Global variables are static storage, they are allocated storage space at compile time, local variables within functions are allocated on the stack, and the most flexible memory usage is allocated on the heap, also called memory dynamic allocation. Commonly used memory dynamic allocation functions include: malloc, Alloc, realloc, new, etc., dynamic release functions including free, delete.

Once the dynamic memory is successfully applied, we need to manage it ourselves, which is the most error-prone. The following program includes errors that are common in memory dynamic management.

Listing 9

Common memory dynamic management errors include the following:

    • Inconsistent application and release

Because C + + is compatible, and C is different from C + + memory request and release functions, there are two sets of dynamic memory management functions in C + + programs. One immutable rule is that the memory applied in C is released in C, and the memory applied in C + + is released in C + +. That is, the memory that is applied by the Malloc/alloc/realloc method is released with free, and the memory requested in new mode is freed with delete. In the above procedure, the use of malloc to apply for memory and delete to release, although this will not be a problem in many cases, but this is definitely a potential problem.

    • Application and release mismatch

How much memory is requested and how much will be released after the use is completed. If not released, or less released is a memory leak, more release will also cause problems. In the above procedure, the pointer p and PT point to the same piece of memory, but are released two times in succession.

    • Still read and write after release

Essentially, the system will maintain a dynamic memory linked list on the heap, if released, it means that the block memory can continue to be allocated to other parts, if the memory is freed and then accessed, it may overwrite the other part of the information, this is a serious error, the above program in line 16th is released after the release of the memory is still written.

Results Analysis:

Assuming this file is named Badmac.cpp, the resulting executable program is BADMAC, tested with Memcheck, and output as follows.

Listing 10

The output shows that the 14th row allocation and deallocation functions are inconsistent, the 16th line has an illegal write operation, that is, the memory address to be freed, and the 17th line frees the memory function to be invalid. The above three problems were found accurately.

Memory leaks

Problem Description:

Memory leak refers to the memory that is dynamically requested in the program and is not released after use and cannot be accessed by other parts of the program. Memory leaks are the most vexing problem in developing large programs, so that some people say that memory leaks are unavoidable. In fact, to prevent memory leaks from good programming habits, the other important thing is to strengthen unit test, and Memcheck is such an excellent tool.

The following is a typical memory leak case. The main function calls the MK function to generate the tree node, but after the call is complete, there is no corresponding function: Nodefr frees the memory so that the tree structure in memory cannot be accessed by other parts, causing a memory leak.

In a single function, everyone's memory leak awareness is relatively strong. However, in many cases, we will do some packaging for malloc/free or new/delete to meet our specific needs and not be able to use and release in one function. This example also illustrates where memory leaks are most likely to occur: The two-part interface part, a function to request memory, and a function to free up memory. And these functions are developed and used by different people, which makes memory leaks more likely. This requires a good habit of unit testing to eliminate memory leaks in the initial phase.

Listing 11

Listing 11.2

Listing 11.3

Results Analysis:

Assuming that the above file fame tree.h, Tree.cpp, Badleak.cpp, the generated executable program is badleak, test it with Memcheck, output is as follows.

Listing 12

The sample program is the process of building a tree, with each tree node having a size of 12 (considering memory alignment), a total of 8 nodes. As can be seen from the above output, all memory leaks are discovered. Memcheck divides memory leaks into two types, one is the possible memory leak (possibly lost), and the other is a deterministic memory leak (definitely lost). Possibly lost refers to a pointer that still has access to a block of memory, but the pointer is no longer the first address of the memory. Definitely lost refers to the memory that has not been able to access this block. The definitely lost is divided into two types: direct and indirect (indirect). The direct and indirect difference is that there is no pointer to the memory directly, and the pointer to that memory is located at the memory leak. In the above example, the root node is directly lost, while the other nodes are indirectly lost.

Back to top of page

Summarize

This article introduces the architecture of Valgrind, and highlights its most widely used tools: Memcheck. This paper expounds the basic principle of Memcheck discovery memory problem, the fundamental usage method, and how to find out the most extensive memory problems in the current development by using Memcheck. Discovering memory issues early in the project can greatly improve development efficiency, and Valgrind is a great tool to help you achieve this goal.

http://blog.csdn.net/kl222/article/details/40890823

Applying Valgrind to discover memory problems with Linux programs

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.