Implementation and reflection of C ++ polymorphism Technology

Source: Internet
Author: User

 

C ++Implementation and reflection of polymorphism Technology

Author: Yang ximin Meng Yan Source: programmer magazine


Object-oriented technology first appeared in the Simula 67 System in the 1960 s, and developed well in the Smalltalk system developed by the Paul Alto laboratory in the 1970 s. However, for most programmers, C ++ is the first available object-oriented programming language. Therefore, many of our object-oriented concepts and ideas come directly from C ++. However, C ++ selects a solution that is completely different from Smalltalk when implementing key polymorphism in object-oriented programming. The result is that although both have achieved similar polymorphism, there is a huge difference in practice. Specifically, the implementation of C ++ polymorphism is more efficient, but not suitable for all occasions. Many inexperienced C ++ developers do not understand this truth and forcibly use the C ++ polymorphism mechanism in inappropriate situations. They are unable to extricate themselves from the trap of reducing their foot. This article will discuss in detail the limitations of the C ++ polymorphism technology and solutions.

Implementation Technology for two different virtual method calls
C ++ polymorphism is the basis for implementing object-oriented technology in C ++. Specifically, when a pointer to the base class calls a virtual member function, the runtime system can call an appropriate member function implementation based on the actual object pointed to by the pointer. As follows:

ClassBase{
Public:
Virtual void vmf (){...}
};
  
Class derived: public base {
Public:
Virtual void vmf (){...}
};
  
Base * P = new base ();
P-> vmf (); // call base: vmf
P = new derived ();
P-> vmf (); // called here
// Derived: vmf
...

 

Note that the two lines highlighted in the Code call different function implementations even though their syntax is identical. The so-called "polymorphism" refers to this. This knowledge is well known to every C ++ developer.
Now let's assume that we are the real person of the language. How should we implement this polymorphism? It is not difficult for us to get a basic idea. The implementation of polymorphism requires that we add an indirect layer to intercept the call to the method in this indirect layer, and then call the corresponding method implementation based on the actual object pointed to by the pointer. In this process, the indirect layer we add is very important. It needs to do the following:
1. obtain all information about a method call, including the method to be called and the actual parameters to be passed in.
2. Know the actual object to which the pointer (reference) points when a call occurs.
3. Find the appropriate method implementation code and execute the call according to the information obtained in steps 1st and 2.
The key here is how to find the appropriate method to implement the Code in step 1. Because polymorphism is about objects, we need to bind the appropriate method implementation code with objects during design. That is to say, a query table structure must be implemented at the object level. Based on the object and method information obtained in steps 1 and 2, find the actual Method Code address in the search table and call it. Now the problem has changed. We should find the problem based on the information. There are two different solutions for this problem: one is to search by name, and the other is to search by location. It seems that there is no big difference between the two ideas, but in practice, these two different implementation ideas lead to a huge difference. Next we will examine it in detail.
In dynamic object-oriented languages such as Smalltalk, Python, and Ruby, the actual method search is based on the method name. The table structure is as follows:
Because such a query table performs a method query based on the method name, the query process involves a string comparison, which is less efficient. However, this kind of table searching has a prominent advantage, that is, high utilization of effective space. To illustrate this, we assume that a base class base contains 100 methods that can be rewritten by the derived class (so there are 100 Methods shared by all base objects for searching tables ), one of its derived classes, derived, only intends to rewrite five of the five methods. Therefore, the method lookup table of the derived class object only requires five items. When a method is called, runtime searches for strings in the five-way method lookup table based on the called method name. If the method is found in the lookup table, run the call, otherwise, a forward call is called to the base class for execution. This is the standard action for calling virtual methods. When the number of methods actually rewritten by the derived class is small, you can arrange the query table as a linear table and compare the order of the query. In this case, the effective space usage reaches 100%. If the number of methods actually rewritten by the derived class is large, you can use a hash function. If you use a reasonable hash function, you can also make high space utilization (generally close to 75% ).. to quickly find the method. It should be noted that the compiler can easily obtain the names of all the methods to be rewritten, so it can execute the standard gperf algorithm to obtain the optimal hash function.

In fact, we can also understand the advantages of this solution. The "method name" item in each item in the table is considered as the description information of the "method address" item, therefore, it can be considered that the method in this solution searches for a table with self-described information (or metadata ). Based on this data structure that carries self-description information, you can implement a variety of extended functions, such as inserting new methods at runtime or intercepting method calls at the user level. Therefore, we can say that this solution is widely applied and flexible, but it is not optimal in terms of execution efficiency.
Another virtual search method is very familiar to C ++ developers and is based on absolute location positioning technology. The Table query structure is very simple. It is just an array of pointers that store the method address. Each item in the table is not self-descriptive. Only the compiler knows which method they correspond to during compilation, in addition, the call code for the method is compiled into a compact pointer + offset call hard encoding. The biggest feature of this type of table search is high efficiency. To call a method based on this type of table search, you only need to perform a random access operation in the array once. Among all the "add an indirect layer" solutions we can think of, this solution is the most efficient. However, this solution requires that all the same-family polymorphism objects have the same table to be searched. That is to say, you must make sure that all the Virtual Methods of objects that implement an interface have the same semantics for the K items of the table. Assume that a base class has 100 virtual methods that can be rewritten, there are 100 items in its virtual method lookup table (in fact, there are 100 pointers pointing to the method entry address ). All of its derived class objects must have a virtual table with the same structure and a length of at least 100 items. Now, assuming that only five methods of the base class are rewritten in a developed derived class, the virtual method table shared by the derived class object is still up to 100 items, however, 95 items are exactly the same as the items in the table in the virtual method of the base class object. Only 5 items have practical significance-it is the existence of these 5 items that makes the existence of the derived class meaningful.

In this case, the actual effective utilization of the table in this method is only 5%. In general, this solution is the most efficient but not suitable for all scenarios.
Of course, it seems that the technical effects of the above two virtual method calls are exactly the same, and everything is hidden under the compiler, and it has nothing to do with General developers. But is that true? As we can see below, the C ++ lookup table structure constitutes one of the most sinister technical traps in C ++ application development.

Two different polymorphism application scenarios
Readers who have studied numerical analysis should be familiar with the fact that in the field of matrix computing, the resolution of low-order dense matrices and the resolution of high-order sparse matrices are completely different in nature, the storage solution is different from the solution algorithm. Interestingly, there are also two very different scenarios in the practical application of polymorphism, which are similar to matrix question categories.
In the first scenario, the objects we construct are relatively simple. There are not many sibling classes in the same family, but there are large differences between them. Therefore, the number of Virtual Methods in objects is small, the rewrite rate is high. The object-oriented examples we usually use in textbooks and the objects we use in general application fields belong to this category.
For example, if a modem class has many features, the total number of virtual methods cannot exceed 20, most or even all of these virtual methods may be rewritten. Another example is the COM interface. Because the idea of COM components is based on interfaces, an interface with good granularity must be "thin and lean. For example, the imalloc interface has only six methods (excluding the three methods inherited from iunknown), and ipersistfile contains five methods, generally, the number of methods in the COM interface written by the user cannot exceed 20. To implement the COM interface, almost all methods need to be rewritten. This is very similar to the low-level dense matrix, so it is worth using the simplest and most direct query table structure to achieve-fast and simple. Because of the high rewrite rate of the virtual method, the efficiency of searching tables is high. This scenario is widely used in the implementation of C ++ polymorphism. It can be said that the virtual method calling mechanism with C ++ characteristics is used to deal with such applications.
The second application scenario is very different. In this scenario, the objects are complex, the features are dense, and the behavior is changeable. in the same family, the number of sibling objects is large, and the differences are similar. The number of virtual methods in this object is large, but the rewriting rate is low. Gui systems and video games are typical examples of such application scenarios. Since we have been dealing with Windows systems all day, it is most appropriate to use Windows GUI to describe this scenario. We know that almost all objects on the Windows GUI are window objects in concept, so they constitute an object family. This family has three outstanding features. First, there are many behaviors and changing features (or a large number of virtual methods ). Microsoft Windows directly defines hundreds of window messages, and allows users to define new messages in the form of wm_user + N and wm_app + n. In object-oriented words, it is equivalent to defining hundreds of Virtual Methods for rewriting for all window objects in windows, and allowing users to freely expand new virtual methods.
The second feature is the low rewriting rate, which is similar to that of the same-family objects. Generally, defwindowproc is used for unified processing of the vast majority of window messages, or the sendmessage function is used to forward (delegate) messages to the standard window objects provided by the system for processing, this is equivalent to handing over these messages to the basic class window objects for processing, and only intercepting (rewriting) dozens or even dozens of messages (methods ). Compared with the large number of Virtual Methods in the window object family, the rewrite rate generally does not exceed 20%. The third feature is the large number of sibling classes. From the standard window to the special window, from the dialog box to the button, from the toolbar to the text box, everything is a window, and even the two buttons look exactly the same, just caption is different, when you press the button, different operations need to be constructed using different classes. Therefore, in an application GUI system of a common scale, it is not surprising to construct hundreds of window classes of the same size. Any developer who has some understanding of the Win32 API is not difficult to understand.
From the introduction of the C ++ virtual method calling mechanism in section 1st, it is easy to know that C ++ searches for a table structure based on absolute locations without any self-description information, this is not applicable to the second scenario. If you use the C ++ native object model to implement a GUI system similar to Windows, the result is as follows: base class (set as kwindow class) define 1000 Virtual Methods (how many places should be set aside for user expansion ?), In this way, we have a query table of up to 1000, and all directly and indirectly derived class objects, in order to maintain compatibility with the kwindow in the method query table structure, at least one query table with a length of 1000 must be included.

Let's take an extreme example to appreciate the absurd nature of this solution. Suppose there is a class kpushbutton derived from kwindow and a standard button control is implemented by rewriting 20 virtual methods, how many items in the table can be searched through its virtual method? Sorry, there are not 20 items, but at least 1000 Items (if it does not add a new method). The vast majority of them are just the intact cloning of the kwindow virtual method table, and only 20 items belong to it, only these 20 items are truly meaningful, and 980 Items in the method table are wasted. Their unique significance lies in occupying some positions, so that the calculation of "pointer plus offset" can continue to be accurate addressing. What do you think is terrible? No, it can be worse!
If you need a standard button, its appearance, color, text, and other behaviors are exactly the same as those of kpushbutton, but the operations of the corresponding click event are different. What do you need? It is obvious that the kmypush-buttonok class is derived from kpushbutton, and then one of the methods is rewritten (probably called onclick ). In this new class, how long is the virtual method table? Is it one item? No. Is it 20 items? No. Actually, it's 1000! Only one of them (onclick) represents its meaning, and the other 999 items (occupying 3996 bytes on 32-bit machines) are almost completely wasted! Dozens of interfaces and hundreds of custom controls are arranged in a medium-sized application. Therefore, the storage space wasted on the virtual method table is hundreds of kb or more than 1 MB. Maybe this number is really a pediatrics in today's era of storing large baskets of GB, but the ugly ideas behind it are actually any developer with a bit of conscience (especially C ++ developers) not tolerable.
That's why, from owl to VCL ,.. from MFC to QT, the GUI and game development frameworks that have emerged in recent years, none of the C ++ GUI frameworks that involve a large number of event behaviors uses the Standard C ++ polymorphism technology to construct a window-class hierarchy. Instead, they fight independently and come up with a variety of techniques to bypass this reef. Among them, there are three classic solutions, represented by the dynamic VCL method, the global event lookup table of MFC, and the signal/slot of QT. The idea behind it is the same. To sum it up with Grady booch, it is: "When you find that the system requires a large number of similar small classes, A large number of similar small objects should be used." 2 that is to say, some problems that would have led to the need to derive new classes to solve are solved using new instances. This idea will almost inevitably lead to a mechanism similar to the delegate in C # To become a necessity. Unfortunately, standard C ++ does not support delegate. Although many people in the C ++ community have made various efforts to apply advanced skills such as template and functor, there is still a gap between the effect and the real delegate. Therefore, in order to keep the solution simple, Borland C ++ builder has extended the _ closure keyword, and MFC has developed a bunch of strange macros. QT has developed a MOC Preprocessor, which is a great solution, express.
Let's make a summary. There are two different application scenarios for Object-Oriented polymorphism, and C ++'s standard polymorphism technology is only suitable for one of them, but not for the other, it must be implemented using other mechanisms.

Solutions and suggestions
Some readers may have a lot of doubts about C ++. It should be noted that the implementation technology of C ++ selection polymorphism fully complies with the c ++ philosophy. In addition, C ++ allows you to solve this problem in a variety of possible ways. Today, relying on a variety of mature GUI frameworks, we can automatically bypass the reef in most cases.
The severity of the problem lies in the fact that many developers have insufficient understanding of the limitations of C ++ native polymorphism Technology in the second application scenario, therefore, when they face similar problems, they will unconsciously step into the trap. I would like to remind C ++ developers that when the system you are facing contains standard event processing features and the number of events is large, Please carefully consider your class hierarchy design. We can consider imitating the solution of MFC or QT, but in my opinion, a more direct and simple method is to simulate the method described in section 1st of this article to search tables based on string comparison, A single message delivery object is used to distribute messages to each object. Because the message delivery object often needs to be adjusted and changed, it is placed separately in a DLL or even COM component and loaded into the process at runtime. This solution is not the most sophisticated, but is effective in most cases and easy to implement. It is not described in detail here.
In fact, I personally think that the C ++ language should solve this problem from the compiler. The basic idea is to change the virtual Method Search Mechanism of the derived class object when the number of Virtual Methods of the base class is large and the number of methods rewritten by the derived class is small (this information can be obtained from the compilation process, change search by location to search by actually called function information. In this way, the virtual method table in the derived class does not have to be in the same structure as the base class, thus avoiding space waste. This idea is similar to the dynamic keyword in Delphi/Object Pascal. This article will not go into details.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.