C ++ Engineering Practice (2): Do not reload the Global: Operator new ()

Last Update:2018-12-06 Source: Internet

Author: User

Tags class operator

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Chen Shuo (giantchen_at_gmail)

Blog.csdn.net/solstice

This article only considers Linux X86 platform and server development (does not consider the issue of cross-dll Memory Allocation and release in Windows ). This article assumes that the reader knows what: Operator new () and: Operator Delete () are, and there are differences and relationships with the new/delete expressions, for more information, see Mr. Hou Jie'sArticleOr this article.

The memory management of C ++ is a common topic. In section 7th, "When a destructor encounters multithreading", I wrote an episode: systematically avoiding various pointer errors "briefly reviews some common problems and solutions in modern c ++. Basically, the memory is managed by the modern c ++ method (raiI). It is hard for you to encounter memory errors. "No error" is a basic requirement and does not mean "good enough ". We often try to optimize the performance. If Profiling indicates that the hot spot is in memory allocation and release, the Global: Operator new () and: Operator Delete () are reloaded () it seems to be a good method once and for all (hereinafter referred to as "Overload: Operator new ()"). This article attempts to explain that this method often does not work.

Basic Requirements for Memory Management

If you only want to allocate and release the files, the basic requirement for memory management is "No duplicate": Neither duplicate Delete nor missing Delete. That is to say, we often say that new/delete should be paired. "pairing" not only means that the number is equal, but also implies that the call of new and delete must match, do not "pay back the things borrowed by the East ". For example:

The memory allocated with the system's default malloc () should be handed over to the system's default free () for release;
Objects created using the default new expression must be parsed and released by the default Delete expression;
Objects created using the default new [] expression must be handed over to the default Delete [] Expression for structure and release;
The memory allocated by the system's default: Operator new () should be handed over to the system's default: Operator Delete () for release;
Objects Created with placement new must be parsed using placement Delete (for the sake of convenience, let's just say) (in fact, directly calling the destructor );
The memory allocated from a memory pool A should be returned to the memory pool.
If new/delete is customized, follow the rules. For more information, see Objective C ++.

It is not difficult to achieve this. It is the basic skill of every C ++ developer. However, if you want to reload the Global: Operator new (), it will be troublesome.

Overload: Operator new () Reason

Article 50th of the third edition of Objective C ++ lists the reasons for customizing new/delete:

DetectionCodeMemory Error in
Optimized Performance
Obtain memory usage statistics

These are legitimate requirements. At the end of this article, we will see that: Operator new () can achieve the same purpose without heavy load.

: Operator new ().

1. without changing its signature, the original version of the system can be seamlessly replaced, for example:

# Include <New>

Void * operator new (size_t size );

Void operator Delete (void * P );

In this way, the user does not need to include any special header file, that is, they do not need to see the two function declarations. This method is usually used for "performance optimization.

2. Add new parameters. These additional parameters are also provided during the call, for example:

Void * operator new (size_t size, const char * file, int line );// Its returned pointer must be released by the common: Operator Delete (void *)

Void operator Delete (void * P, const char * file, int line );// This function is called only when the Destructor throws an exception.

And then

Foo * P = new (_ file, _ line _) Foo;// This will track which file and which code row will allocate memory.

We can also use macros to replace New to save typing. The second method is used for reload. the user needs to see the two function declarations, that is, to actively include the header file you provided. This method is usually used for "Checking memory errors" and "counting memory usage. Of course, this is not absolute.

In the C ++ learning stage, everyone can write 100 or 200 rowsProgramReload: Operator new () will not cause any trouble in such a toy program.

However, in actual product development, reload: Operator new () is the best strategy. We have a simpler and safer way to achieve the above goals.

Realistic Development Environment

As a C ++ application developer, we usually use some libraries when writing programs of a relatively large scale. We can classify the library providers into the following categories:

The C-language standard library also includes POSIX series functions provided by the Linux programming environment.
Third-party C language library, such as OpenSSL.
The standard library of C ++ language, mainly STL. (I think no one is using iostream in the product, right ?)
Third-party General C ++ libraries, such as boost. RegEx or an XML library.
Other teams in the company develop internal infrastructure C ++ libraries, such as network communication and log infrastructure.
This project team's colleagues developed their own basic libraries for this application, such as a three-dimensional model's affine transformation module.

When using these libraries, it is inevitable to exchange data between libraries. For example, the output of library a is used as the input of library B, while the output of library a often uses the dynamically allocated memory (such as STD: vector <double> ).

If all c ++ libraries use the same memory distributor (New/delete by default), it is very convenient to release the memory, and you can just release it directly to delete. If this is not the case, you should always remember "which distributor does the memory belong to? Is this system default? Is it customized? Do not make a mistake when releasing it ".

(C library usually uses malloc/free to allocate and release memory by default, because C language does not mention so much customization as C ++ does, there is no "memory error" mentioned above. Some may consider that a more comprehensive C library will allow you to register two functions for internal allocation and release of memory. This gives you full control over the memory usage of the library. This method of dependency injection becomes fancy and useless in C ++. See Allocator in the C ++ standard library written by Chen Shuo.)

However, if you reload: Operator new (), it may not be that simple.

Overload: Operator new () dilemma

First, reload: Operator new () does not bring any trouble to the C language library. Of course, reload it to get three benefits that cannot be enjoyed by the C language library.

Only the C ++ library and C ++ main programs are considered below.

Rule 1: Never reload in the Library: Operator new ()

If you are the author of a library and your library must be provided to others, you do not have the permission to reload the Global: Operator new (size_t) (note that this is the first overload method mentioned above), because it is very aggressive: any program that uses your library is forced to use your overloaded: Operator new (), others may be reluctant to do so. In addition, if both libraries attempt to reload: Operator new (size_t), they will fight. I guess duplicated symbol Link error will occur. As the creator of the library, do not reload: Operator new (size_t.

What about the second overload method?First,: Operator new (size_t size, const char * file, int Line). In this way, the void * pointer must be: Operator Delete (void *) and :: operator Delete (void * P, const char * file, int line) functions are released. At this time, you need to decide whether the pointer returned by: Operator new (size_t size, const char * file, int line) is compatible with the system's default: Operator Delete (void *).

If it is incompatible (that is, it cannot use the default: Operator Delete (void *) to release the memory), you must reload: Operator Delete (void *), make its behavior match your operator new (size_t size, const char * file, int line. Once you decide to reload: Operator Delete (void *), you must reload: Operator new (size_t), which returns to situation 1: You do not have permission to reload the Global :: operator new (size_t ).
If you select the default compatible system: Operator Delete (void *), you can do very limited things in operator new (size_t size, const char * file, int line, for example, you cannot dynamically allocate additional memory for house keeping or save statistical data (whether explicit or implicit), because the default: Operator Delete (void *) will not release your extra memory. (Implicit memory allocation here refers to adding elements to a container like STD: Map <> .)

It is estimated that many people are dizzy, but this is not complete yet.

SecondIn the library, reload operator new (size_t size, const char * file, int line) also involves your overload and whether to expose it to library users (other libraries or main programs ). Here, "exposure" has two meanings: 1) Will the code containing your header file be reloaded with: Operator new (), 2): Operator new () after reload () can the allocated memory be safely released outside your library. If not, do you want to expose an interface function so that the user can safely release the memory? Or is shared_ptr returned to use its "capture" deleter feature? It sounds complicated? I will not discuss it one by one here. In short, as the author of the library, never use the idea of "reload operator new.

Fact 2: overloading in the main program: Operator new () has little effect

This is not a rule, but an attempt to demonstrate that it does not make much sense.

If the first method is used to reload the Global: Operator new (size_t), it will affect all c ++ libraries used in this program. This may not cause any problems, however, I suggest you use the simpler "alternative" described in the next section ".

If you use the second method to overload: Operator new (size_t size, const char * file, int line), will your behavior benefit other C ++ libraries used in this program? For example, do you want to count the memory usage in C ++ library? If a library returns its own memory and objects allocated with new, so that you can release them after use, are you sure you want to check the memory released due to errors?

C ++ library has two types of Code Organization: 1) it is provided as a header file (such as a template library represented by STL and boost); 2) provided by header files + binary library files (most non-template libraries are released in this way ).

For libraries implemented in the header file mode, you can. the first line of the CPP file contains the header file of the overload: Operator new, so that other C ++ libraries used in the program will also use your: Operator new to allocate memory. Of course, this is a very aggressive method. If you are lucky, compilation and running are fine. If you are lucky, you may encounter compilation errors. This is not a bad thing. Luck is worse, there is no compilation error. During running, illegal access occurs from time to time, resulting in segment fault; or in some cases, your custom allocation policy conflicts with the library, memory data corruption, and inexplicable behavior occurs.

This does not benefit libraries implemented in the form of library files, because the Library source file has been compiled into binary code and it will not call your new overload :: operator new (Think about it. How can I provide additional new (_ file __,_ line _) parameters for compiled binary code ?) Even more troublesome, if some header files have an inline function, it will also cause a strange "crosstalk ". That is to say, some of the libraries use your Allocator, and some use the default allocator of the system. When the memory is released, it is not used to place it, causing damage to the data structure of the Allocator.

In short, the second overload method seems to have more functions, but it is difficult to seamlessly cooperate with other C ++ libraries used in the program.

To sum up, for the c ++ project in real life, the heavy load: Operator new () is almost useless, because it is difficult to deal with the relationship with the c ++ library used by the program, after all, most libraries are designed without considering that you will reload: Operator new () and force it to it.

What should I do if I need to customize the memory allocation?

Alternative

It's easy to replace malloc. If needed, load a file directly from the malloc level through ld_preload. so, there is a malloc/free alternative implementation (drop-in replacement), which can serve both C and C ++ code, and avoid C ++ overloading :: the dark corner of operator new.

For the usage of "detecting memory errors", valgrind, dmalloc, or efence can be used for the same purpose. Professional debugging tools are more reliable than a self-built memory checker.

For "Statistical memory usage data", replacing malloc can also obtain enough information, because we can use the backtrace () function to obtain the call stack, which is better than new (_ file __, _ line _) is more informative. For example, you can analyze (_ file __,_ _ line _) to find that STD: string is allocated in large quantities to release memory, with overhead exceeding expectations, but you do not know which part of the code is repeatedly creating and destroying the STD: String object, because (_ file __, _ line __) you can only call the function at the innermost layer. You can use backtrace () to find the real initiator.

For the "performance optimization" usage, I think that in the current multi-threaded development, it is unrealistic to implement a memory distributor that can beat the system's default malloc. A general-purpose memory distributor is inherently quite difficult, and implementing a safe and efficient general-purpose (global) memory distributor for multi-threaded programs exceeds the capabilities of General developers. It is better to use the existing multi-core and multi-thread optimized malloc, such as the memory alloc in Google tcmalloc and Intel TBB 2.2. Fortunately, these Allocator statements are not intrusive and do not need to be overloaded: Operator new ().

Is there a problem with overloading operator new () for a separate class?

Unlike global: Operator new (), the influence of per-class operator new () and operator Delete () is much smaller. It only affects the class and its derived classes. It seems that it is feasible to overload member operator new. I am opposed to this.

If a class node needs to overload member operator new (), it uses a special memory allocation policy. A common situation is that the memory pool or object pool is used. I would rather display this fact clearly than change the default behavior of the new node. Specifically, the factory is used to create objects, such as static node * node: createnode () or static shared_ptr <node> node: createnode ();.

This can be attributed to the principle of least surprise: If I read node * P = new node in the code, I will think it has allocated memory on heap, if the node class reloads member operator new (), read the node carefully. h to find that this line of code uses a private memory pool. Why not be clear? If it is written as node * P = node: createnode (), then I can guess that node: createnode () must have done something different from the new node, so as not to be surprised in the future.

The Zen of Python says explicit is better than implicit, and I believe it.

Summary: overloading: Operator new () may be urgent in some temporary scenarios, but it should not be used as a strategy. If necessary, we can start from the malloc level and completely replace the memory distributor.

References:

[1] Hou Jie, "the spring and autumn of the pool -- Design Philosophy and painless use of the memory pool", and "programmer", 9th, Issue 1.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More