[Zz] C/C ++ Memory leakage and detection tools

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Introduction to C/C ++ Memory leakage and Its detection tools

2006.05.25

Reprinted from: http://www.uml.org.cn/c%2B%2B/200605254.htm

For a C/C ++ programmer, memory leakage is a common and troublesome problem. Many technologies have been developed to address this problem, such as smart pointer and garbage collection. The smart pointer technology is relatively mature. STL already contains classes that support smart pointer, but it does not seem to be widely used and cannot solve all the problems; garbage collection technology has been relatively mature in Java, but its development in the C/C ++ field is not smooth. Although some people have long thought about adding GC support to C ++. The real world is like this. As a C/C ++ programmer, memory leakage is always in your heart. Fortunately, there are many tools that can help us verify the existence of memory leaks and find out the problematic code.

Memory leakage Definition

Generally, memory leakage refers to heap memory leakage. Heap memory refers to the memory allocated by the program from the heap, which is of any size (the size of the memory block can be determined during the running period). The released memory must be displayed after use. Applications generally use functions such as malloc, realloc, and new to allocate a block of memory from the heap. after use, the program must call free or delete to release the block. Otherwise, this memory cannot be used again, so we can say this memory is leaked. The following applet demonstrates heap memory leakage:

Void myfunction (INT nsize)
{
Char * P = new char [nsize];
If (! Getstringfrom (p, nsize )){
MessageBox ("error ");
Return;
}
... // Using the string pointed by P;
Delete P;
}

Example 1

When the getstringfrom () function returns zero, the memory pointed to by the pointer P will not be released. This is a common case of Memory leakage. The program allocates memory at the entrance and releases the memory at the exit. However, the C function can exit from anywhere. Therefore, a memory leak may occur if a certain exit does not release the memory that should be released.

Broadly speaking, memory leaks include not only heap memory leaks, but also system resource leaks, such as core handle, GDI object, socket, and interface, basically, these objects allocated by the operating system also consume memory. If these objects are leaked, memory leakage will eventually occur. In addition, some objects consume core-state memory, which causes instability of the entire operating system when such objects are seriously leaked. Therefore, the system resource leakage is more serious than the heap memory leakage.

The leakage of GDI object is a common resource leakage:

Void cmyview: onpaint (CDC * PDC)
{
Cbitmap BMP;
Cbitmap * poldbmp;
BMP. loadbitmap (idb_mybmp );
Poldbmp = PDC-> SelectObject (& BMP );
...
If (something ()){
Return;
}
PDC-> SelectObject (poldbmp );
Return;
}

Example 2

When something () returns a non-zero value, the program does not select poldbmp back to PDC before exiting. This will cause the hbitmap object pointed to by poldbmp to leak. If this program runs for a long time, the entire system may be blurred. This problem is easily exposed in Win9x, because the GDI Heap of Win9x is much smaller than that of Win2k or NT.

Memory leakage occurs in the following ways:

Memory leakage can be classified as follows:

1. Frequent Memory leakage. Code with Memory leakage will be executed multiple times, resulting in a memory leak each time it is executed. For example, if the something () function returns true all the time, the hbitmap object pointed to by poldbmp always leaks.

2. Occasional Memory leakage. Memory leakage occurs only in certain environments or operations. For example, if the something () function returns true only in a specific environment, the hbitmap object pointed to by poldbmp does not always leak. The frequency and frequency are relative. For a specific environment, unexpected events may become frequent. Therefore, the test environment and test method are crucial for detecting memory leaks.

3. One-time memory leakage. The code with Memory leakage is executed only once, or due to algorithm defects, there will always be only one piece of Memory leakage. For example, the class constructor allocates the memory, but the memory is not released in the destructor. However, because the class is a Singleton, the memory leakage only occurs once. Another example:

Char * g_lpszfilename = NULL;

Void setfilename (const char * lpcszfilename)
{
If (g_lpszfilename ){
Free (g_lpszfilename );
}
G_lpszfilename = strdup (lpcszfilename );
}

Example 3

If the program does not release the string pointed to by g_lpszfilename at the end, even if setfilename () is called multiple times, there will always be one piece of memory, and only one piece of memory will leak.

4. Implicit Memory leakage. The program continuously allocates memory during the running process, but it does not release the memory until it ends. Strictly speaking, there is no memory leakage because the program releases all requested memory. However, if a server program needs to run for several days, weeks, or even months, failing to release the memory in time may eventually exhaust all the system memory. Therefore, we call this type of Memory leakage an implicit memory leak. For example:

Class connection
{
Public:
Connection (socket S );
~ Connection ();
...
PRIVATE:
Socket _ socket;
...
};

Class connectionmanager
{
Public:
Connectionmanager (){}
~ Connectionmanager (){
List: iterator it;
For (IT = _ connlist. Begin (); it! = _ Connlist. End (); ++ it ){
Delete (* it );
}
_ Connlist. Clear ();
}
Void onclientconnected (socket s ){
Connection * P = new connection (s );
_ Connlist. push_back (P );
}
Void onclientdisconnected (connection * pconn ){
_ Connlist. Remove (pconn );
Delete pconn;
}
PRIVATE:
LIST _ connlist;
};

Example 4

If the server does not call the onclientdisconnected () function after the client is disconnected from the server, the connection object of the connection will not be deleted in time (when the server program exits, all connection objects will be deleted in the onmanager destructor ). Implicit Memory leakage occurs when connections are established and disconnected constantly.

From the perspective of user programs, memory leakage does not cause any harm. As a general user, the memory leakage does not exist. The real danger is the accumulation of Memory leakage, which will eventually consume all the memory of the system. From this point of view, one-time memory leakage is not harmful because it will not accumulate, while implicit memory leakage is very harmful, because it is more difficult to detect than frequent and occasional memory leaks. Detect memory leakage

The key to detecting memory leaks is to be able to intercept calls to functions that allocate and release memory. By intercepting these two functions, we can track the lifecycle of each memory. For example, every time a memory is allocated successfully, we add its pointer to a global list; each time a piece of memory is released, its pointer is deleted from the list. In this way, when the program ends, the remaining pointer in the list is directed to the memory that is not released. Here is a simple description of the basic principle of Memory leakage detection. For detailed algorithms, see <writing solid code> of Steve Maguire.

To detect heap memory leakage, You need to intercept malloc/realloc/free and new/delete (in fact, new/delete will eventually use malloc/free, so you only need to intercept the previous group ). For other leaks, you can use a similar method to intercept the corresponding allocation and release functions. For example, to detect BSTR leaks, You need to intercept sysallocstring/sysfreestring; to detect hmenu leaks, You need to intercept createmenu/destroymenu. (There are multiple resource allocation functions and only one release function. For example, sysallocstringlen can also be used to allocate BSTR. In this case, multiple allocation functions need to be intercepted)

In Windows platform, there are three commonly used tools for detecting Memory leakage, the built-in detection function of ms c-Runtime Library; External detecting tools such as purify and boundschecker; use the Performance Monitor provided by Windows NT. These three tools have their own advantages and disadvantages, ms c-Runtime Library although the function is weaker than the plug-in tool, but it is free; Performance Monitor although can not identify the code of the problem, but it can detect the existence of Implicit Memory leakage, which is beyond the control of other two tools.

We will discuss in detail the three detection tools below:

Memory leakage detection method under VC

Applications developed with MFC are automatically added with Memory leakage detection code after compilation in debug mode. After the program ends, if memory leakage occurs, all leaked memory blocks will be displayed in the debug window, the following two lines show the information of a leaked memory block:

E:/testmemleak/testdlg. cpp (70): {59} normal block at 0x00881710,200 bytes long.

Data: <abcdefghijklmnop> 61 62 63 64 65 66 67 68 69 6a 6B 6C 6D 6e 6f 70

The first line shows the memory block created by testdlg. CPP file, with 70th lines of code allocated. The address is 0x00881710 and the size is 200 bytes. {59} refers to the Request order that calls the memory allocation function, for more information, see the help of _ crtsetbreakalloc () in msdn. The second line shows the content of the first 16 bytes of the memory block. The content in the angle brackets is displayed in ASCII format, followed by the hexadecimal format.

Generally, we mistakenly assume that these memory leak detection functions are provided by MFC, but they are not. MFC only encapsulates and utilizes the debug function of Ms C-Runtime Library. Non-MFC programs can also use the debug function of the Ms C-Runtime Library to add the memory leakage detection function. The ms c-Runtime Library has built in the memory leakage detection function when implementing functions such as malloc/free and strdup.

Note that the project generated by the MFC Application Wizard has such a macro definition in the header of each CPP file:

# Ifdef _ debug
# Define new debug_new
# UNDEF this_file
Static char this_file [] = _ file __;
# Endif

With this definition, all new in this CPP file will be replaced with debug_new when the debug version is compiled. So what is debug_new? Debug_new is also a macro. The following is from afx. H, row 1632.

# Define debug_new new (this_file, _ line __)

So if there is such a line of code:

Char * P = new char [200];

After macro replacement, it becomes:

Char * P = new (this_file, _ line _) Char [2, 200];

According to the C ++ standard, the compiler will find the operator new defined in this way for the above new usage:

Void * operator new (size_t, lpcstr, INT)

We found an implementation of operator new in row afxmem. cpp 63.

Void * afx_cdecl operator new (size_t nsize, lpcstr lpszfilename, int nline)
{
Return: Operator new (nsize, _ normal_block, lpszfilename, nline );
}

Void * _ cdecl operator new (size_t nsize, int ntype, lpcstr lpszfilename, int nline)
{
...
Presult = _ malloc_dbg (nsize, ntype, lpszfilename, nline );
If (presult! = NULL)
Return presult;
...
}

The second operator new function is relatively long. For the sake of simplicity, I only extract some. Obviously, the final memory allocation is implemented through the _ malloc_dbg function, which belongs to the debug function of the Ms C-Runtime Library. This function not only requires the input memory size, but also has two parameters: File Name and row number. The file name and row number are used to record the code produced by this allocation. If the program is not released before the end of the program, the information will be output to the debug window.

Here, this_file ,__ file and _ line _ are mentioned by the way __. Both _ file _ and _ line _ are macros defined by the compiler. When _ file _ is encountered, the compiler replaces _ file _ with a string, which is the path name of the file currently being compiled. When _ line _ is encountered, the compiler replaces _ line _ with a number, which is the row number of the current line of code. The definition of debug_new does not directly use _ file __, but uses this_file to reduce the size of the target file. Assume that new is used in 100 of a CPP file. If _ file __is used directly, the compiler will generate 100 constant strings. Are all 100 strings encoded? /Span> the path name of the CPP file, which is obviously redundant. If this_file is used, the compiler will generate only one constant string. In this case, all new calls at Part 1 use pointers to constant strings.

Observe the project generated by the MFC Application Wizard again and we will find that only new is mapped in the CPP file. If you use the malloc function to allocate memory directly in the program, the file name and row number that call malloc will not be recorded. If this memory leak, the ms c-Runtime Library can still detect, but when the output of this memory block information, does not contain the allocated file name and row number.

To enable the Memory Leak Detection Function in a non-MFC program, you only need to add the following lines of code at the entrance of the program:

Int tmpflag = _ crtsetdbgflag (_ crtdbg_report_flag );

Tmpflag | = _ crtdbg_leak_check_df;

_ Crtsetdbgflag (tmpflag );

In this way, after the function winmain, main, or dllmain returns, if there are still memory blocks not released, their information will be printed into the debug window.

If you try to create a non-MFC Application and add the above Code at the entrance of the program, and deliberately do not release some memory blocks in the program, you will see the following information in the debug window:

{47} normal block at 0x00c91c90, 200 bytes long.

Data: <> 00 01 02 03 04 05 06 07 08 09 0a 0b 0C 0d 0e 0f

The memory leakage is indeed detected, but the file name and row number are missing compared with the above MFC program example. It is very difficult to solve problems for a large program without such information.

To know where the leaked memory block is allocated, You need to implement a MFC ing function similar to MFC to map functions such as new and maolloc to the _ malloc_dbg function. I will not go into details here. You can refer to the source code of MFC.

Because the debug function is implemented in Ms C-runtimelibrary, it can only detect heap memory leakage, and only limited to memory allocated by malloc, realloc, strdup, and those system resources, for example, handle, GDI object, or memory not allocated through the C-Runtime Library, such as the leakage of variant and BSTR, cannot be detected, which is a major limitation of this method. In addition, in order to record where memory blocks are allocated, the source code must be matched, which is very troublesome for debugging some old programs. After all, modifying the source code is not a worry-free task, this is another limitation of this method.

For developing a large program, the detection function provided by the ms c-Runtime Library is far from enough. Next we will look at the external inspection tools. I use a lot of boundschecker. First, because of its comprehensive functions, more importantly, its stability. If these tools are unstable, they will be too busy. In the end, it's from the famous numbench. Basically, there is no big problem in using it.

Use boundschecker to detect memory leakage:

Boundschecker uses a technology called code injection to intercept calls to functions that allocate and release memory. To put it simply, when your program starts running, the boundschecker DLL is automatically loaded into the address space of the process (which can be implemented through the system-level hook ), then it modifies the function calls for memory allocation and release in the process, so that these calls are first transferred to its code, and then the original code is executed. Boundschecker does not need to modify the source code or project configuration file of the program to be debugged, which makes it very simple and direct.

Here we use the malloc function as an example to intercept other functions in a similar way.

Functions to be intercepted may be in DLL or program code. For example, if the C-Runtime Library is statically linked, the code of the malloc function will be linked to the program. To intercept calls to such functions, boundschecker dynamically modifies the commands of these functions.

The following two pieces of assembly code, one without boundschecker intervention, and the other with boundschecker intervention:

126: _ cribd void * _ cdecl malloc (
127: size_t nsize
128 :)
129 :{

00403c10 push EBP
00403c11 mov EBP, ESP
130: Return _ nh_malloc_dbg (nsize, _ newmode, _ normal_block, null, 0 );
00403c13 push 0
00403c15 push 0
00403c17 Push 1
00403c19 mov eax, [_ newmode (0042376c)]
00403c1e push eax
00403c1f mov ECx, dword ptr [nsize]
00403c22 push ECx
00403c23 call _ nh_malloc_dbg (00403c80)
00403c28 add ESP, 14 h
131 :}

The following code involves boundschecker:

126: _ cribd void * _ cdecl malloc (
127: size_t nsize
128 :)
129 :{

00403c10 JMP 01f41ec8
00403c15 push 0
00403c17 Push 1
00403c19 mov eax, [_ newmode (0042376c)]
00403c1e push eax
00403c1f mov ECx, dword ptr [nsize]
00403c22 push ECx
00403c23 call _ nh_malloc_dbg (00403c80)
00403c28 add ESP, 14 h
131 :}

After boundschecker intervened, the first three assembly commands of the malloc function were replaced with a JMP command. The original three commands were moved to the address 01f41ec8. After the program enters malloc, JMP goes to 01f41ec8 and the original three commands are executed, which is then the world of boundschecker. In general, it records the return address of the function (the return address of the function is on the stack, so it is easy to modify), and then points the return address to the boundschecker code, then jump to the original instruction of the malloc function, that is, in the location of 00403c15. When the malloc function ends, the return address is modified, and it is returned to the boundschecker Code. At this time, the boundschecker records the memory pointer allocated by malloc, then jump to the original return address.

If the memory allocation/release functions are in the DLL, boundschecker uses another method to intercept calls to these functions. Boundschecker modifies the program's dll import table to point the function address in the table to its own address for interception.

By intercepting these allocation and release functions, the boundschecker can record the lifecycle of allocated memory or resources. The next question is how it relates to the source code. That is to say, when the boundschecker detects a memory leak, how does it report the code allocation of this memory block. The answer is debug information ). When we compile a debug program, the compiler will record the correspondence between the source code and the binary code and put it in a separate file (. PDB) or directly linked to the target program. By directly reading the debugging information, you can obtain the file on which the source code of a memory is allocated and the line on which the source code is located. With code injection and debug information, the boundschecker can not only record the source code location of the call assignment function, but also record the call stack at the time of allocation and the source code location of the function on the call stack. This is very useful when using a class library like MFC. Here is an example:

Void showxitemmenu ()
{
...
Cmenu menu;

Menu. createpopupmenu ();
// Add menu items.
Menu. trackpropupmenu ();
...
}

Void showyitemmenu ()
{
...
Cmenu menu;
Menu. createpopupmenu ();
// Add menu items.
Menu. trackpropupmenu ();
Menu. Detach (); // This will cause hmenu leak
...
}

Bool cmenu: createpopupmenu ()
{
...
Hmenu = createpopupmenu ();
...
}

When showyitemmenu () is called, we intentionally cause hmenu leakage. However, for boundschecker, the leaked hmenu is allocated in class cmenu: createpopupmenu. Assume that many of your programs use the createpopupmenu () function of cmenu, such as cmenu: createpopupmenu (). You still cannot determine the root cause of the problem, is createpopupmenu () used in showxitemmenu () or showyitemmenu (), or other places ()? With the call stack information, the problem is easy. Boundschecker reports the leaked hmenu information as follows:

Function
File
Line

Cmenu: createpopupmenu
E:/8168/vc98/mfc/include/afxwin1.inl
1009

Showyitemmenu
E:/testmemleak/mytest. cpp
100

Other function calls are omitted here.

In this way, we can easily find that the function that causes the problem is showyitemmenu (). When using a class library such as MFC for programming, most API calls are encapsulated in the class of the class library, with the call stack information, we can easily track the truly leaked code.

Recording the call stack information slows down the running of the program. By default, boundschecker does not record the call stack information. Follow these steps to enable the call stack logging option:

1. Open the menu: boundschecker | setting...

2. on the error detection page, select Custom in the list of error detection scheme.

3. Select pointer and leak error check in the combox of category.

4. Hook the report call stack check box

5. Click OK.

Based on code injection, boundschecker also provides API parameter verification and memory over run functions. These functions are very beneficial for program development. Because these contents do not belong to the topic of this article, we will not detail them here.

Although boundschecker is so powerful, it still seems pale in the face of Implicit Memory leakage. So let's take a look at how to use performance monitor to detect memory leaks.

Use performance monitor to detect memory leakage

During the design process, the NT kernel has added the system monitoring function, such as CPU usage, memory usage, and I/O operation frequency, applications can read these counters to understand the running status of the entire system or a process. Performance Monitor is such an application.

To detect memory leakage, we can monitor three counters of the Process object, namely, handle count, virutal bytes, and working set. Handle count records the number of handle opened by the process. Monitoring the counter helps us find whether the program has handle leakage; virtual bytes records the size of the virtual memory currently used by the process in the virtual address space. NT Memory Allocation adopts two steps. First, in this case, the operating system does not allocate physical memory, but retains a segment of address. Then, submit the space and the operating system will allocate physical memory. Therefore, virtual bytes is generally larger than the working set of the program. Monitoring virutal bytes helps us find some underlying system problems. The working set records the total amount of memory submitted by the operating system for the process, this value is closely related to the total amount of memory applied by the program. If the program has a memory leak, this value will continue to increase, but virtual bytes is a skip increase.

Monitoring these counters allows us to understand the memory usage of processes. If there is a leak, even implicit memory leakage, these counters will continue to increase. However, we know that there is a problem, but we do not know where it is, so we generally use performance monitor to verify whether there is a memory leak, and use boundschecker to locate and solve it.

When performance monitor displays a memory leak, but boundschecker cannot detect it, there are two possibilities: first, occasional Memory leakage. In this case, make sure that the running environment and operation method of the program are the same when Performance Monitor is used and boundschecker is used. Second, implicit memory leakage occurs. At this time, you need to review the program design, and then carefully study the counter value change chart recorded by performance monitor, analyze the relationship between the changes and the program running logic, and find some possible causes. This is a painful process, full of assumptions, conjecture, verification, and failure, but it is also an excellent opportunity to accumulate experience.

Summary

Memory leakage is a big and complex problem. Even if Java and. Net have a gabarge collection mechanism, there is a possibility of leakage, such as implicit memory leakage. Due to space limitations and capacity limitations, this article can only make a rough research on this topic. Other problems, such as multi-module leakage detection and how to analyze the memory usage during the running of the program, can be further studied. If you have any ideas, suggestions, or errors, please contact me.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

[Zz] C/C ++ Memory leakage and detection tools

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support