This article will show you some good memory-related coding practices to keep memory errors under control. Memory errors are the root cause of C and C ++ programming: they are common and have been known to be serious for more than 20 years, but they have never been completely resolved and may seriously affect applications, few development teams have clearly defined management plans. But the good news is that they are not mysterious.
Introduction
Memory Errors in C and C ++ programs are extremely harmful: they are common and may cause serious consequences. Many of the most serious security announcements from the Computer Emergency Response Team (see references) and vendors are caused by simple memory errors. Since the end of 1970s, C programmers have been discussing such errors, but the impact is still very high in 2007. Even worse, many C and C ++ programmers may think that memory errors are uncontrollable and mysterious, and they can only be corrected, cannot be prevented.
But this is not the case. This article will allow you to understand all the features of good memory encoding in a short time:
- The importance of correct Memory Management
- Memory Error category
- Memory programming policy
- Conclusion
The importance of correct Memory Management
C and C ++ programs with memory errors may cause various problems. If they leak memory, the running speed will gradually slow down and the operation will eventually stop. If the memory is overwritten, the Operation will become very fragile and vulnerable to attacks by malicious users. From 1988Morris wormThe latest security alerts against Flash Player and other critical retail-level programs are related to buffer overflow: "Most computer security vulnerabilities are buffer overflow," rosydney Bates wrote in 2004.
Many other general languages (such as Java, Ruby, Haskell, C #, Perl, and Smalltalk) are also widely supported in areas where C or C ++ can be used ), each language has many enthusiasts and advantages. However, from a computing perspective, the main advantages of each programming language over C or C ++ are closely related to ease of memory management. Memory-related programming is so important, and correct application in practice is so difficult, so that it controls all other variables or theories of object-oriented programming languages, functional programming languages, advanced programming languages, declarative programming languages, and other programming languages.
Like a few common errors of other types, memory errors are a hidden hazard: they are hard to reproduce and the symptoms are usually not found in the source code. For example, whenever and wherever a memory leak occurs, it may be totally unacceptable to the application, and the memory leak is not obvious.
Therefore, for all these reasons, we need to pay special attention to the memory problems of C and C ++ programming. Let's take a look at how to solve these problems. Let's not talk about the language.
Memory Error category
First, do not lose confidence. There are many ways to deal with memory problems. First, we will list all possible problems:
- Memory leakage
- Error allocation, including a large increase
free()
Released memory and uninitialized reference
- Floating pointer
- Array boundary Violation
This is all types. These types do not change significantly even if the data is migrated to the C ++ object-oriented language.struct
Or C ++ classes, the memory management and reference models in C and C ++ are both in principle the same. The vast majority of the following content is the "pure C" language. It is mainly used for exercises to extend to C ++.
Memory leakage
Memory leakage occurs when resources are allocated, but it is never recycled. The following is an error-prone model (see Listing 1 ):
Listing 1. Simple potential heap memory loss and buffer coverage
void f1(char *explanation){ char p1; p1 = malloc(100); (void) sprintf(p1, "The f1 error occurred because of '%s'.", explanation); local_log(p1);} |
Have you seen the problem? Unlesslocal_log()
Pairfree()
The released memory has an unusual response capability. Otherwisef1
Will leak 100 bytes. When the memory stick incrementally distributes several megabytes of memory, a single leak is insignificant, but even a small leak will weaken the application after the consecutive operation hours.
In actual C and C ++ programming, this is not enough to affect yourmalloc()
Ornew
In this section, the beginning of the sentence mentioned that "resource" does not only refer to "Memory", because there are examples similar to the following content (see Listing 2 ).FILE
Handles may be different from memory blocks, but they must be given the same attention:
Listing 2. Potential heap memory loss from resource error management
int getkey(char *filename){ FILE *fp; int key; fp = fopen(filename, "r"); fscanf(fp, "%d", &key); return key; } |
fopen
The semanticsfclose
. Infclose()
In the case that the C standard cannot specify the case, it is likely that the memory leak. Other resources (such as semaphores, network handles, and database connections) are also worth considering.
Memory Allocation Error
Management of misallocation is not very difficult. The following is an example (see listing 3 ):
Listing 3. uninitialized pointer
void f2(int datum){ int *p2; /* Uh-oh! No one has initialized p2. */ *p2 = datum; ... } |
The good news about such errors is that they generally have notable results. In AIX, allocation of Uninitialized pointers usually results inSegmentation faultError. It has the advantage that any such errors will be quickly detected; compared with errors that take several months to determine and are hard to reproduce, the cost of detecting such errors is much lower.
Multiple variants exist in this error type.free()
Released memory ratiomalloc()
More frequently (see Listing 4 ):
Listing 4. Two wrong Memory releases
/* Allocate once, free twice. */void f3(){ char *p; p = malloc(10); ... free(p); ... free(p); } /* Allocate zero times, free once. */void f4(){ char *p; /* Note that p remains uninitialized here. */ free(p);} |
These errors are usually not very serious. Although the C Standard does not define specific actions in these cases, typical implementations will ignore errors or quickly and clearly mark them; in short, these are security situations.
Floating pointer
Hanging pointers are tricky. When a programmer uses a resource after the memory resources are released, a floating pointer occurs (see listing 5 ):
Listing 5. Floating pointer
void f8() { struct x *xp; xp = (struct x *) malloc(sizeof (struct x)); xp.q = 13; ... free(xp); ... /* Problem! There's no guarantee that the memory block to which xp points hasn't been overwritten. */ return xp.q; } |
Traditional "Debugging" is difficult to isolate hanging pointers. They are hard to reproduce for the following two obvious reasons:
- Even if the code that affects the early release of memory range is localized, the memory usage may still depend on other execution locations in different processes, even in extreme cases.
- Floating pointers may occur in code that uses memory in a subtle way. The result is that it is difficult to identify the new value as an error even if it exists and is immediately overwritten, and the new value is different from the expected value.
Hanging pointers constantly threaten the running status of C or C ++ programs.
Array boundary Violation
Array boundary violation is very dangerous. It is the last major category of memory error management. Look back at listing 1. Ifexplanation
If the length of a file exceeds 80, what will happen? A: It is unpredictable, but it may be far from a good situation. In particular, C copies a string that is not suitable for the allocation of 100 characters. In any general implementation, the "more than" character will overwrite other data in the memory. The layout of data allocation in the memory is very complex and difficult to reproduce, so no symptoms can be traced back to specific errors at the source code level. These errors usually result in millions of dollars in losses.
Memory programming policy
Diligence and self-discipline can minimize the impact of these mistakes. Next we will introduce several specific steps you can take. My experience in handling them in various organizations is that at least a certain order of magnitude can be used to continuously reduce memory errors.
Encoding Style
The encoding style is the most important, and I have never seen any other authors emphasize it. Functions and methods that affect resources (especially memory) must be explicitly interpreted. The following are examples of headers, comments, or names (see Listing 6 ).
Listing 6. source code examples for identifying resources
/******** * ... * * Note that any function invoking protected_file_read() * assumes responsibility eventually to fclose() its * return value, UNLESS that value is NULL. * ********/FILE *protected_file_read(char *filename){ FILE *fp; fp = fopen(filename, "r"); if (fp) {... } else {... } return fp;} /******* * ... * * Note that the return value of get_message points to a * fixed memory location. Do NOT free() it; remember to * make a copy if it must be retained ... * ********/char *get_message(){ static char this_buffer[400]; ... (void) sprintf(this_buffer, ...); return this_buffer; } /******** * ... * While this function uses heap memory, and so * temporarily might expand the over-all memory * footprint, it properly cleans up after itself. * ********/ int f6(char *item1){ my_class c1; int result; ... c1 = new my_class(item1); ... result = c1.x; delete c1; return result;}/******** * ... * Note that f8() is documented to return a value * which needs to be returned to heap; as f7 thinly * wraps f8, any code which invokes f7() must be * careful to free() the return value. * ********/int *f7(){ int *p; p = f8(...); ... return p;} |
Make these format elements part of your daily work. You can use various methods to solve memory problems:
- Dedicated Library
- Language
- Software Tools
- Hardware checker
In this field, I always think that the most useful and ROI is to consider improving the source code style. It does not require expensive or strict format; you can always cancel comments of segments unrelated to memory, but the definition that affects memory needs to be explicitly annotated. Adding a few simple words can make the Memory Results clearer and improve the memory programming.
I did not conduct a controlled experiment to verify the effect of this style. If your experience is the same as mine, you will find that the policies that do not describe the impact of resources are simply intolerable. This is simple, but it brings too many benefits.
Detection
Detection is a supplement to the encoding standard. The two have their own advantages, but the combined use of the two is particularly effective. Clever C or C ++ professionals can even browse unfamiliar source code and detect memory problems at a very low cost. With a small amount of practice and appropriate text search, you can quickly verify the balance*alloc()
Andfree()
Ornew
Anddelete
. Manual viewing of such content usually causes the same problems as in listing 7.
Listing 7. Tricky Memory leakage
static char *important_pointer = NULL;void f9(){ if (!important_pointer) important_pointer = malloc(IMPORTANT_SIZE); ... if (condition) /* Ooops! We just lost the reference important_pointer already held. */important_pointer = malloc(DIFFERENT_SIZE); ... } |
Ifcondition
True. You cannot use the automatic runtime tool to detect memory leaks. After careful source analysis, we can infer the correct conclusions from such conditions. I will repeat what I wrote about style: although many of the released memory issue descriptions all emphasize tools and languages, for me, the biggest result is the "soft" developer-centric process change. Any improvements you make in style and detection can help you understand the diagnostics produced by automated tools.
Static automatic syntax analysis
Of course, not only humans can read the source code. You should alsoStatic syntax analysisBecome part of the development process. Static syntax analysis islint
,Strict CompilationAnd the content executed by several commercial products: scan the source and target items accepted by the compiler, but this may be a symptom of errors.
You want to disable your codeLint. Althoughlint
It is outdated and has some limitations, but many programmers who have not used it (or its more advanced descendants) have made great mistakes. In general, you can write the ignorelint
Excellent professional quality code, but efforts to do so usually result in a major error. Some of these errors affect the correctness of the memory. Even the most expensive license fee for products of this category is meaningless compared to the cost of first discovering memory errors. Clear source code. Now, even iflint
Mark encoding may provide you with the required functions, but there may be a simpler method, which can meetlint
And can be transplanted.
Memory Library
The last two categories of remediation methods are significantly different from those of the first three. The former isLightweightA person can easily understand and implement them. On the other hand, memory libraries and tools usually have high licensing fees. For some developers, they need to be further improved and adjusted. Programmers who effectively use libraries and tools understand lightweightStaticMethod personnel. The available libraries and tools are impressive: their quality as a group is high. However, even the best programmers may be confused by very capricious programmers who ignore the basic principles of memory management. I have observed that common programmers can only feel discouraged when trying to use the memory library and tools for isolation.
For these reasons, we urge C and C ++ programmers to first understand their own sources to solve memory problems. After this is done, the database is considered.
You can use several libraries to write common C or C ++ code and ensure improved memory management. Jonathan Bartlett introduced the main candidates in the 2004 comment column on developerWorks, which can be obtained in the references section below. Library can solve a variety of different memory problems, so it is very difficult to directly compare them; common topics in this regard includeGarbage Collection,Smart pointerAndSmart container. In general, the Library can automatically perform more memory management, so that programmers can make fewer mistakes.
I have various feelings about the internal database. They are working hard, but I see that they have achieved less success than expected in the project, especially in C. I have not carefully analyzed these disappointing results. For example, the performance should be consistent with the correspondingManualMemory Management is as good as it is, but this is a gray area-especially when the garbage collection library processes slowly. The most clear conclusion from this practice is that C ++ seems to be able to better accept smart pointers than the code group concerned by C.
Memory tools
Development teams that develop really C-based applications need runtime memory tools as part of their development strategies. The introduced technologies are very valuable and indispensable. You may not know the quality and functions of the Memory tool before you try it yourself.
This article mainly discusses software-based memory tools. There is also a Hardware Memory debugger, which is considered in special cases (mainly when using dedicated hosts that do not support other tools.
Software Memory tools on the market include proprietary tools (such as IBM Rational Purify and Electric Fence) and other open source code tools. Many of them can be used well with AIX and other operating systems.
All memory tools have the same functions: Build a specific version of the executable file (similar to-g
Mark the generated debugging version), exercise-related applications, and research reports automatically generated by the tool. Consider the program shown in listing 8.
Listing 8. Sample Errors
int main(){ char p[5]; strcpy(p, "Hello, world."); puts(p);} |
This program can be "run" in many environments. It compiles, executes, and prints "Hello, world./n" to the screen. If you use a memory tool to run the same application, an array boundary violation report is generated in row 4. In terms of understanding software errors (copying fourteen characters to a space that can only contain five characters), this method is much less costly than searching for error symptoms at the customer. This is the credit of the Memory tool.
Conclusion
As a mature C or C ++ programmer, you realize that memory issues deserve special attention. By developing some plans and practices, you can find a way to control memory errors. Learn the correct mode of memory usage and quickly discover possible errors, making the technology described in this article part of your daily work. You can eliminate the symptoms in the application at the beginning, or it may take several days or weeks for debugging.