The greatest difficulty in C Language

Source: Internet
Author: User
Tags coding standards
Reveal the greatest difficulty in C Language)
This article will show you some good memory-related coding practices to keep memory errors under control. Memory errors are the root cause of C and C ++ programming: they are common and have been known to be serious for more than 20 years, but they have never been completely resolved and may seriously affect applications, few development teams have clearly defined management plans. But they are not mysterious.

Memory Errors in C and C ++ programs are extremely harmful: they are common and may cause serious consequences. Many of the most serious security announcements from the Computer Emergency Response Team (see references) and vendors are caused by simple memory errors. Since the end of 1970s, C programmers have been discussing such errors, but the impact is still very high in 2007. Even worse, many C and C ++ programmers may think that memory errors are uncontrollable and mysterious, and they can only be corrected, cannot be prevented.

But this is not the case. This article will allow you to understand all the features of good memory encoding in a short time:

The importance of correct Memory Management
Memory Error category
Memory programming policy

The importance of correct Memory Management

C and C ++ programs with memory errors may cause various problems. If they leak memory, the running speed will gradually slow down and the operation will eventually stop. If the memory is overwritten, the Operation will become very fragile and vulnerable to attacks by malicious users. The latest security alerts from the famous Morris worm attack in 1988 to Flash Player and other key retail programs are related to buffer overflow: "Most computer security vulnerabilities are buffer overflow ", rosydney Bates wrote in 2004.

Many other general languages (such as Java? Ruby, Haskell, C #, Perl, Smalltalk, etc.), each language has many enthusiasts and their respective advantages. However, from a computing perspective, the main advantages of each programming language over C or C ++ are closely related to ease of memory management. Memory-related programming is so important, and correct application in practice is so difficult, so that it controls all other variables or theories of object-oriented programming languages, functional programming languages, advanced programming languages, declarative programming languages, and other programming languages.

Like a few common errors of other types, memory errors are a hidden hazard: they are hard to reproduce and the symptoms are usually not found in the source code. For example, whenever and wherever a memory leak occurs, it may be totally unacceptable to the application, and the memory leak is not obvious.

Therefore, for all these reasons, we need to pay special attention to the memory problems of C and C ++ programming. Let's take a look at how to solve these problems. Let's not talk about the language.

Memory Error category

First, do not lose confidence. There are many ways to deal with memory problems. First, we will list all possible problems:

Memory leakage
Error allocation, including a large increase in free () memory and uninitialized references
Floating pointer
Array boundary Violation
This is all types. Even if you migrate data to a C ++ object-oriented language, these types will not change significantly; whether the data is a simple type or a C-language struct or C ++ class, the memory management and reference models in C and C ++ are the same in principle. The vast majority of the following content is the "pure C" language. It is mainly used for exercises to extend to C ++.

Memory leakage

Memory leakage occurs when resources are allocated, but it is never recycled. The following is an error-prone model (see Listing 1 ):

Listing 1. Simple potential heap memory loss and buffer coverage

Copy content to clipboard

Code:

              
        void f1(char *explanation)
        {
            char p1;

            p1 = malloc(100);
            (void) sprintf(p1,
                           "The f1 error occurred because of '%s'.",
                           explanation);
            local_log(p1);
        }
  

Have you seen the problem? Unless local_log () has an unusual response capability to the memory released by free (), each call to f1 will leak 100 bytes. When the memory stick incrementally distributes several megabytes of memory, a single leak is insignificant, but even a small leak will weaken the application after the consecutive operation hours.

In actual C and C ++ programming, this is not enough to affect your use of malloc () or new, the sentence at the beginning of this section mentions that "resource" does not only refer to "Memory", because there are examples similar to the following (see Listing 2 ). FILE handles may be different from memory blocks, but they must be given the same attention:

Listing 2. Potential heap memory loss from resource error management

Copy content to clipboard

Code:

               
        int getkey(char *filename)
        {
            FILE *fp;
            int key;

            fp = fopen(filename, "r");
            fscanf(fp, "%d", &key);
            return key;
        }

The semantics of fopen requires complementary fclose. In the absence of fclose (), when the C Standard cannot specify the occurrence, it is likely that the memory is leaked. Other resources (such as semaphores, network handles, and database connections) are also worth considering.

Memory Allocation Error

Management of misallocation is not very difficult. The following is an example (see listing 3 ):

Copy content to clipboard

Code:

Listing 3. uninitialized pointer

Void f2 (int datum)
{
Int * p2;

                /* Uh-oh!  No one has initialized p2. */
            *p2 = datum;
               ...
        }

The good news about such errors is that they generally have notable results. In AIX? The allocation of Uninitialized pointers usually immediately causes the segmentation fault error. It has the advantage that any such errors will be quickly detected; compared with errors that take several months to determine and are hard to reproduce, the cost of detecting such errors is much lower.

Multiple variants exist in this error type. Free () memory is released more frequently than malloc () (see Listing 4 ):

Listing 4. Two wrong Memory releases

Copy content to clipboard

Code:

               
        /* Allocate once, free twice. */
        void f3()
        {
            char *p;

            p = malloc(10);
             ...
            free(p);
             ...
            free(p);
        }

        /* Allocate zero times, free once. */
        void f4()
        {
            char *p;

                /* Note that p remains uninitialized here. */
            free(p);
        }

These errors are usually not very serious. Although the C Standard does not define specific actions in these cases, typical implementations will ignore errors or quickly and clearly mark them; in short, these are security situations.

Floating pointer

Hanging pointers are tricky. When a programmer uses a resource after the memory resources are released, a floating pointer occurs (see listing 5 ):

Listing 5. Floating pointer

Copy content to clipboard

Code:

               
       void f8()
       {
           struct x *xp;

           xp = (struct x *) malloc(sizeof (struct x));
           xp.q = 13;
           ...
           free(xp);
           ...
               /* Problem!  There's no guarantee that
                  the memory block to which xp points
                  hasn't been overwritten. */
           return xp.q;
       }

Traditional "Debugging" is difficult to isolate hanging pointers. They are hard to reproduce for the following two obvious reasons:

Even if the code that affects the early release of memory range is localized, the memory usage may still depend on other execution locations in different processes, even in extreme cases.
Floating pointers may occur in code that uses memory in a subtle way. The result is that it is difficult to identify the new value as an error even if it exists and is immediately overwritten, and the new value is different from the expected value.
Hanging pointers constantly threaten the running status of C or C ++ programs.

Array boundary Violation

Violation of the number of groups is very dangerous. It is the last major category of memory error management. Let's look back at listing 1. What happens if the length of the explain statement exceeds 80? A: It is unpredictable, but it may be far from a good situation. In particular, C copies a string that is not suitable for the allocation of 100 characters. In any general implementation, the "more than" character will overwrite other data in the memory. The layout of data allocation in the memory is very complex and difficult to reproduce, so no symptoms can be traced back to specific errors at the source code level. These errors usually result in millions of dollars in losses.

Memory programming policy

Diligence and self-discipline can minimize the impact of these mistakes. Next we will introduce several specific steps you can take. My experience in handling them in various organizations is that at least a certain order of magnitude can be used to continuously reduce memory errors.

Encoding Style

The encoding style is the most important, and I have never seen any other authors emphasize it. Functions and methods that affect resources (especially memory) must be explicitly interpreted. The following are examples of headers, comments, or names (see Listing 6 ).

Listing 6. source code examples for identifying resources

Copy content to clipboard

Code:

     
        /********
         * ...
         *
         * Note that any function invoking protected_file_read()
         * assumes responsibility eventually to fclose() its
         * return value, UNLESS that value is NULL.
         *
         ********/
        FILE *protected_file_read(char *filename)
        {
            FILE *fp;

            fp = fopen(filename, "r");
            if (fp) {
                ...
            } else {
                ...
            }
            return fp;
        }

        /*******
         * ...
         *
         * Note that the return value of get_message points to a
         * fixed memory location.  Do NOT free() it; remember to
         * make a copy if it must be retained ...
         *
         ********/
        char *get_message()
        {
            static char this_buffer[400];

            ...
            (void) sprintf(this_buffer, ...);
            return this_buffer;
        }

        /********
         * ...
         * While this function uses heap memory, and so
         * temporarily might expand the over-all memory
         * footprint, it properly cleans up after itself.
         *
         ********/
        int f6(char *item1)
        {
            my_class c1;
            int result;
            ...
            c1 = new my_class(item1);
            ...
            result = c1.x;
            delete c1;
            return result;
        }
        /********
         * ...
         * Note that f8() is documented to return a value
         * which needs to be returned to heap; as f7 thinly
         * wraps f8, any code which invokes f7() must be
         * careful to free() the return value.
         *
         ********/
        int *f7()
        {
            int *p;

            p = f8(...);
            ...
            return p;
        }
     

Make these format elements part of your daily work. You can use various methods to solve memory problems:

Dedicated Library
Language
Software Tools
Hardware checker
In this field, I always think that the most useful and ROI is to consider improving the source code style. It does not require expensive or strict format; you can always cancel comments of segments unrelated to memory, but the definition that affects memory needs to be explicitly annotated. Adding a few simple words can make the Memory Results clearer and improve the memory programming.

I did not conduct a controlled experiment to verify the effect of this style. If your experience is the same as mine, you will find that the policies that do not describe the impact of resources are simply intolerable. This is simple, but it brings too many benefits.

Detection

Testing is a supplement to coding standards. The two have their own advantages, but the combined use of the two is particularly effective. Clever C or C ++ professionals can even browse unfamiliar source code and detect memory problems at a very low cost. With a small amount of practice and appropriate text search, you can quickly verify the balanced source subjects of * alloc () and free () or new and delete. Manual viewing of such content usually causes the same problems as in listing 7.

Listing 7. Tricky Memory leakage

Copy content to clipboard

Code:

           
        static char *important_pointer = NULL;
        void f9()
        {
            if (!important_pointer)
                important_pointer = malloc(IMPORTANT_SIZE);
            ...
            if (condition)
                    /* Ooops!  We just lost the reference
                       important_pointer already held. */
                important_pointer = malloc(DIFFERENT_SIZE);
            ...
        }
   

If condition is true, the automatic runtime tool cannot detect memory leaks. After careful source analysis, we can infer the correct conclusions from such conditions. I will repeat the style content I wrote: although the memory issue descriptions of a large number of releases all emphasize tools and languages, for me, the biggest result is the "soft" developer-centric process change. Any improvements you make in style and detection can help you understand the diagnostics produced by automated tools.

Static automatic syntax analysis

Of course, not only humans can read the source code. You should also make static syntax analysis part of the development process. Static syntax analysis is the content of lint, strictly compiled, and executed by several commercial products: scanning the source and target items accepted by the compiler, but this may be a symptom of errors.

Hope your code has no lint. Although lint is outdated and has some limitations, many programmers who have not used it (or its more advanced descendants) have made great mistakes. In general, you can write excellent professional quality code that ignores lint, but efforts to do so often result in major errors. Some of these errors affect the correctness of the memory. Even the most expensive license fee for products of this category is meaningless compared to the cost of first discovering memory errors. Clear source code. Now, even if the lint flag encoding may provide you with the required functions, there may be a simpler method. This method can meet the requirements of lint, and is relatively strong and portable.

Memory Library

The last two categories of remediation methods are significantly different from those of the first three. The former is lightweight; one can easily understand and implement them. On the other hand, memory libraries and tools usually have high licensing fees. For some developers, they need to be further improved and adjusted. Programmers who effectively use libraries and tools are people who understand lightweight static methods. The available libraries and tools are impressive: their quality as a group is high. However, even the best programmers may be confused by very capricious programmers who ignore the basic principles of memory management. I have observed that common programmers can only feel discouraged when trying to use the memory library and tools for isolation.

For these reasons, we urge C and C ++ programmers to first understand their own sources to solve memory problems. After this is done, the database is considered.

Use several libraries to write common C or C ++ code and ensure improved memory management. Jonathan Bartlett introduced the main candidates in the 2004 comment column on developerWorks, which can be obtained in the references section below. Libraries can solve a variety of memory problems, so it is very difficult to directly compare them; common topics in this regard include garbage collection, smart pointers, and smart containers. In general, the Library can automatically perform more memory management, so that programmers can make fewer mistakes.

I have various feelings about the internal database. They are working hard, but I see that they have achieved less success than expected in the project, especially in C. I have not carefully analyzed these disappointing results. For example, the performance should be as good as the corresponding manual memory management, but this is a gray area-especially when the garbage collection library processing speed is slow. The most clear conclusion from this practice is that C ++ seems to be able to better accept smart pointers than the code group concerned by C.

Memory tools

Development teams that develop really C-based applications need runtime memory tools as part of their development strategies. The introduced technologies are very valuable and indispensable. You may not know the quality and functions of the Memory tool before you try it yourself.

This article mainly discusses software-based memory tools. There is also a Hardware Memory debugger, which is considered in special cases (mainly when using dedicated hosts that do not support other tools.

Software Memory tools on the market include proprietary tools (such as IBM Rational? Purify and Electric Fence) and other open source code tools. Many of them can be used well with AIX and other operating systems.

All memory tools have the same functions: Build a specific version of the executable file (similar to the debug version generated by using the-g tag during compilation) exercise-related applications and research reports automatically generated by tools. Consider the program shown in listing 8.

Listing 8. Sample Errors

Copy content to clipboard

Code:

              
        int main()
        {
            char p[5];
            strcpy(p, "Hello, world.");
            puts(p);
        }

This program can be "run" in many environments. It compiles, executes, and prints "Hello, world." n "to the screen. If you use a memory tool to run the same application, an array boundary violation report is generated in row 4. In terms of understanding software errors (copying fourteen characters to a space that can only contain five characters), this method is much less costly than searching for error symptoms at the customer. This is the credit of the Memory tool.

As a mature C or C ++ programmer, you realize that memory issues deserve special attention. By developing some plans and practices, you can find a way to control memory errors. Learn the correct mode of memory usage and quickly discover possible errors, making the technology described in this article part of your daily work. You can eliminate the symptoms in the application at the beginning, or it may take several days or weeks for debugging.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.