Implement OOP using assembly
OOP and process-oriented are both ideas in programming, and paradigm is used for academic purposes. Someone once said that since cfront generates C code, you can implement OOP using C itself and even assembly, but there are too many things that need to be done manually. Indeed, the process orientation has been used in assembly design for a long time, and OOP has been at the same point with assembly (TASM introduced the concept of OOP before 95 years ). It is not formal to compile and implement OOP. It cannot provide Strong-typed such as C ++ and other security guarantees (such as access permissions ). Encapsulation is only conceptual and consciously observed.
OOP has several key points: encapsulation, inheritance, and polymorphism. The specific manifestation is to put data and operation data functions together, and put data in objects, providing interfaces for access. Inheritance implements semantic or implementation inheritance, which is also reflected in two aspects: Conceptual Hierarchy and code reuse. Polymorphism uses pointers to implement the use of pointer or reference to realize the polymorphism of the same function in different inheritance classes.
The object model of OOP has several implementation methods, which are described in detail in "inside C ++ object model:
1. Only put data in the object, and associate member-function with class through name mangling technology.
2. Single table model: place the pointer of member function into a separate table and put the entry address of the table into an object (a class corresponds to a table ). This is manifested in C ++ as Vtbl and Vptr. This model achieves dynamic flexibility during runtime, although it has two more dereference times.
3. In the double table model, data and functions are divided into two tables, and the entry addresses of the two tables are stored in objects, so that a single object has a fixed size.
4. Simple model. This is the model used in actual compilation. That is, the data is saved in the object and the function address is saved. Both TASM and MASM do this.
In terms of efficiency, the C ++ approach is optimal. The fourth method of assembly is mandatory to achieve simplicity. To some extent, it is contrary to the efficient spirit of compilation.
TASM is no longer commonly used, and its OOP practices are similar to those of MASM. This article mainly discusses the OOP practices of MASM. The authors are NaN and Thomas Bleeker. The implementation method is to use macro definition to achieve the background work that should have been done by the compiler. There are many macro skills. However, the final use is quite simple. Macro definition is placed in an OBJECTS. INC file. The asm file contains this inc and can use this object model.
Although macros are well-developed, MASM lacks the syntax feature that supports OOP. It is troublesome to use in many aspects or has a cost for space time. For example, a virtual function that overwrites the base class must be manually completed each time. That is, in the hierarchy of inheritance, all the overwritable virtual functions above the parent class must be manually completed in the subclass. Although there are such shortcomings, OOP still brings many benefits to assembly. For example:
1. Compilation of better interaction with things in the Object-Oriented field such as COM and C ++. An example of calling com using assembly + OOP is provided. If compilation + OOP is used to write com, components suitable for high speed and small size can be generated.
2. expanded the scope of issues that can be solved by compilation, making it easier for assembler to manage and collaborate in writing. The author of this object model wrote a neural network-based handwritten letter recognition program using assembly + OOP, less than 200 k (most of which is the space occupied by image files ).
--------------------------------------------------------------------------------
Use
Define a base class.
; Prepare the function prototype.
Shape_Init PROTO: DWORD
Shap_destructorPto typedef proto: DWORD
Shap_getAreaPto typedef proto: DWORD
Shap_setColorPto typedef proto: DWORD,: DWORD
--------------------------------------------------------------------------------
Is actually the definition of STRUC
CLASS Shape, Shap
CMETHOD destructor
CMETHOD getArea
CMETHOD setColor
Color dd?
Shape ENDS
--------------------------------------------------------------------------------
. Data
; Initialization
BEGIN_INIT
Dd offset Shap_destructor_Funct
Dd offset Shap_getArea_Funct
Dd offset Shap_setColor_Funct
Dd NULL
END_INIT
--------------------------------------------------------------------------------
. Code
Shape_Init PROC uses edi esi lpTHIS: DWORD
; Actual call Initialization
SET_CLASS Shape
; Set edi assmue to Shape
SetObject edi, Shape
; The DPrint macro is defined separately.
DPrint "Shape Created (Code in Shape. asm )"
; Cancel assmue
ReleaseObject edi
Ret
Shape_Init ENDP
Shap_destructor_Funct PROC uses edi lpTHIS: DWORD
SetObject edi, Shape
DPrint "Shape Destroyed (Code in Shape. asm )"
ReleaseObject edi
Ret
Shap_destructor_Funct ENDP
Shap_setColor_Funct PROC uses edi lpTHIS: DWORD, DATA: DWORD
SetObject edi, Shape
Mov eax, DATA
Mov [edi]. Color, eax
DPrint "Shape Color Set !! (Code in Shape. asm )"
ReleaseObject edi
Ret
Shap_setColor_Funct ENDP
Shap_getArea_Funct PROC uses edi lpTHIS: DWORD
SetObject edi, Shape
DPrint ""
DPrint "SuperClassing !!!!! This allows code re-use if you use this method !! "
DPrint "Shape's getArea Method! (Code in Shape. asm )"
Mov eax, [edi]. Color
DPrint "Called from Shape. getArea, (Code in Shape. asm )"
DPrintValH eax, "This objects color val is"
DPrint ""
ReleaseObject edi
Ret
Shap_getArea_Funct ENDP
Inherit this class
Include Shape. asm; Inherited class info file
Circle_Init PROTO: DWORD
Circ_destructorPto typedef proto: DWORD
Circ_setRadiusPto typedef proto: DWORD,: DWORD
Circ_getAreaPto typedef proto: DWORD
--------------------------------------------------------------------------------
CLASS Circle, Circ
; Inherit the original data and functions
Shape <>; Inherited Class
CMETHOD setRadius
Radius dd?
Circle ENDS
--------------------------------------------------------------------------------
. Data
BEGIN_INIT
Dd offset Circ_destructor_Funct
Dd offset Circ_setRadius_Funct
Dd NULL
END_INIT
--------------------------------------------------------------------------------
. Code
Circle_Init PROC uses edi esi lpTHIS: DWORD
; Initialize and implement inheritance
SET_CLASS Circle INHERITS Shape
SetObject edi, Circle
; Equivalent to Resetting vptr by the constructor
OVERRIDE getArea, CirleAreaProc
DPrint "Circle Created (Code in Circle. asm )"
ReleaseObject edi
Ret
Circle_Init ENDP
Circ_destructor_Funct PROC uses edi lpTHIS: DWORD
SetObject edi, Circle
DPrint "Circle Destroyed (Code in Circle. asm )"
; Implements the call of basic functions
SUPER destructor
ReleaseObject edi
Ret
Circ_destructor_Funct ENDP
Circ_setRadius_Funct PROC uses edi lpTHIS: DWORD, DATA: DWORD
SetObject edi, Circle
Mov eax, DATA
Mov [edi]. Radius, eax
DPrint "Circle Radius Set (Code in Circle. asm )"
ReleaseObject edi
Ret
Circ_setRadius_Funct ENDP
CirleAreaProc PROC uses edi lpTHIS: DWORD
LOCAL TEMP
SetObject edi, Circle
SUPER getArea
Mov eax, [edi]. Radius
Mov TEMP, eax
Finit
Fild TEMP
Fimul TEMP
Fldpi
Fmul
Fistp TEMP
Mov eax, TEMP
DPrint "Circle Area (integer Rounded) (Code in Circle. asm )"
ReleaseObject edi
Ret
CirleAreaProc ENDP
Generate an object based on the class and use
DEBUGC equ 1
. 586
. Model flat, stdcall
Option casemap: none
Include \ masm32 \ include \ windows. inc
Include \ masm32 \ include \ masm32.inc
Include \ masm32 \ include \ kernel32.inc
Include \ masm32 \ include \ user32.inc
Includelib \ masm32 \ lib \ kernel32.lib
Includelib \ masm32 \ lib \ user32.lib
Includelib \ masm32 \ lib \ masm32.lib
Include Dmacros. inc
Include Objects. inc
Include Circle. asm
. Data
. Data?
HCircle dd?
. Code
Start:
; Recuse all inherited constructors... and do all inits
DPrint ""
DPrint ">>> main. asm <[mov hCircle, $ NEW (Circle)]"
Mov hCircle, $ NEW (Circle)
DPrint ""
DPrint ">>> main. asm <[METHOD hCircle, Circle, setColor, 7]"
METHOD hCircle, Circle, setColor, 7
DPrint ""
DPrint ">>> main. asm <[METHOD hCircle, Circle, setRadius, 2]"
METHOD hCircle, Circle, setRadius, 2
DPrint ""
DPrint "------------ test polymorphic method hCircle. getArea -------------"
DPrint ""
DPrint ">>> main. asm <[DPrintValD $ EAX (hCircle, Circle, getArea), 'area of hcircle']"
DPrintValD $ EAX (hCircle, Circle, getArea), "Area of hCircle"
DPrint ""
DPrint "------------ test polymorphic method hCircle. getArea -------------"
DPrint ""
DPrint ">>> main. asm <[DPrintValD $ EAX (hCircle, Shape, getArea), 'area of hcircle']"
DPrint "Typing calling this Ojbect Instance as a SHAPE type only! This is the true value"
DPrint "of Polymorphism. We dont need to know its a Circle object in order to get"
DPrint "proper area of this instance object, that is inherited from Shape ."
DPrint ""
DPrintValD $ EAX (hCircle, Shape, getArea), "Area of hCircle"
DPrint ""
DPrint ""
DPrint ">>> main. asm <[DESTROY hCircle]"
DESTROY hCircle
DPrint ""
DPrint ""
DPrint "NOTE: superclassing here, as each destructor call's the SUPER destructor"
DPrint "To properly clean up after each class. To see SUPER classing in"
DPrint "in the Polymorphic getArea Function. Uncomment the SUPER code in"
DPrint "CircleAreaProc, and re-compile"
Call ExitProcess
End start
It looks messy, but it looks neat. It consists of four parts.
The first part is the declaration of each member function. In particular, there must be a "class name_init" function, which is a class constructor. The name is the one that cannot be changed.
The second part is the class-guided function declaration, which is actually a STRUC, that is, a struct. The base class definition is included to achieve structural inheritance. (Data inheritance is completed by calling SET_CLASS In the constructor ).
The third part is the initialization sequence (BEGIN_INIT, END_INIT) placed in. data ). It is equivalent to the vtbl of C ++, but also includes the initial value of the data member of the object.
The fourth part is the implementation of each member function. Specifically, the SET_CLASS to be called in the constructor and the OVERRIDE that may be called have completed data inheritance and virtual function rewriting.
In actual use, you can easily use some existing examples. It can indeed bring great convenience.
--------------------------------------------------------------------------------
Principle
All mysteries are in Object. inc, which defines the following macros.
; -- ===================================================== ========================================================== ====== --
; Macro list index:
; -- ===================================================== ========================================================== ====== --
NEWOBJECT: creates an object.
; METHOD calls functions in an object
; DESTROY destroy object (MUST)
SetObject considers the pointer in the register as a pointer of a certain Structure
; ReleaseObject cancels this "think"
OVERRIDE: rewrite the address of the function in the table to realize polymorphism.
; SET_CLASS implements initialization. If necessary, implement inheritance (MUST)
SUPER support for calling basic functions
;
; $ EAX () Accelerated METHOD, returns in eax
; $ EBX () Accelerated METHOD, returns in ebx
; $ ESI () Accelerated METHOD, returns in esi
; $ EDI () Accelerated METHOD, returns in edi
; $ NEW () Accelerated NEWOBJECT, returns in eax
; $ SUPER () Accelerated SUPER, returns in eax
; $ DESTROY () Accelerated DESTROY, returns in eax
; $ Invoke () Accelerated invoke, returns in eax
;
; BEGIN_INIT mark the initialization information (MUST) in the Data Segment)
; END_INIT indicates the end of the mark (MUST)
;
; CLASS is STRUCT (MUST)
; SET_INTERFACE To Declair Abbreviated Interface and Abv Name (MUST)
CMETHOD declares functions in the class or interface
;
; -- ===================================================== ========================================================== ====== --
The number of macros is not much, but it does complete the background work that the compiler has done for us. I will only list and explain the code before and after the macro scale. It is inconvenient to explain the specific implementation of Macros in detail because it involves a lot of syntax and skills (in fact, I just checked the manual and read it a little bit ).
Let's take a look at the CLASS first. This is the entry point of course.
In fact, it is very easy to replace the Class with STRUC.
CLASS Shape, Shap
CMETHOD destructor
CMETHOD getArea
CMETHOD setColor
Color dd?
Shape ENDS
After replacement
Shape STRUC
CMETHOD destructor
CMETHOD getArea
CMETHOD setColor
Color dd?
Shape ENDS
Naturally, let's see how CMETHOD works.
CMETHOD destructor
It becomes
Destructor PTR Circ_destructorPto?
The whole is expanded:
Shape STRUC
Destructor PTR Circ_destructorPto?
GetArea PTR Circ_getAreaPto?
SetColor PTR Circ_setColor?
Color dd?
Shape ENDS
^ _ ^: The struct contains function pointers and data. Then the clues are broken. Defining a structure like this is definitely not acceptable. So let's start with the generation of objects, and how new is implemented.
NEWOBJECT Circle
-->
Invoke GetProcessHeap
Invoke HeapAlloc, eax, NULL, SIZEOF Circle
Push eax
Invoke Circle_Init, eax
Pop eax
This shows a very obvious defect, that is, it must be used in win32, because of the use of win32api. You can replace an api with an external function. Then you can use the function to change the dynamic memory allocation on different platforms.
The generated code is very simple, that is, allocating memory and then calling the object constructor. Here, the mandatory requirement class constructor should be in the form of "class name_init. Although it is not a big limitation, it is not very nice. This also makes sense. You can avoid overhead caused by flexibility by using pointers by writing a pair of names in programming. The following shows that the Destructor uses pointers, this is because virtual desturctor is used by default.
Well, let's move forward to the constructor. Let's see how the constructor is written:
Shape_Init PROC uses edi esi lpTHIS: DWORD
SET_CLASS Shape
SetObject edi, Shape
DPrint "Shape Created (Code in Shape. asm )"
ReleaseObject edi
Ret
Shape_Init ENDP
LpTHIS is a pointer to an object. Here, an object is a sturct. The first line is the key. SET_CLASS is the most troublesome and skillful macro. Let's see how it works.
SET_CLASS Shape
-->
Push esi
Push edi
Cld
Mov esi, offset @ InitValLabel
Mov edi, lpTHIS
Mov ecx, @ InitValSizeLabel
Shr ecx, 2
Rep movsd
Mov ecx, @ InitValSizeLabel
And ecx, 3
Rep movsb
Pop edi
Pop esi
Push and pop are common practices for saving the site. Mov esi and offset @ InitValLabel are related to BEGIN_INIT. Offset @ InitValLabel is the address marked by BEGIN_INIT. In fact, this program has not done anything special. That is, the initialized data between BEGIN_INIT and END_INIT is assigned to the new object. LpTHIS is the address of this object. Since SET_CLASS always assumes that you call it in the constructor, lpTHIS certainly exists (as a constructor parameter ). Cld, rep movsealing and so on are compilation techniques for fast data migration. Check the manual to find out what it is. That is, try to move a dword or a dword at the beginning, and then move a byte to a byte until all of them are moved.
If inheritance is included, it will take a lot of trouble.
SET_CLASS Circle INHERITS Shape
-->
Push esi
Push edi
Cld
Mov edi, lpTHIS
Mov esi, offset @ InitValLabel
Mov eax, [esi]
Mov [edi], eax
Add esi, 4
Add edi, Inher
Mov ecx, (@ InitValSizeLabel-4)
Shr ecx, 2
Rep movsd
Mov ecx, (@ InitValSizeLabel-4)
And ecx, 3
Rep movsb
Pop edi
Pop esi
Because it inherits, You need to reset the destructor. Mov eax, [esi], mov [edi], and eax do this. Because the address of the Destructor has changed, you only need to inherit the following data members, including the pointer of the virtual function.
Next, destroy of the object
DESTROY hCircle
-->
Mov eax, hCircle
Push eax
Call dword ptr [hCircle]
Push eax
Invoke GetProcessHeap
Invoke HeapFree, eax, NULL, hCircle
Pop eax
Because the address of the Destructor is the first member of the object (struct), call is to call the destructor. Use win32api to release the applied memory.
Next is the destructor.
Circ_destructor_Funct PROC uses edi lpTHIS: DWORD
SetObject edi, Circle
DPrint "Circle Destroyed (Code in Circle. asm )"
SUPER destructor
ReleaseObject edi
Ret
Circ_destructor_Funct ENDP
This is the destructor of the circle class inherited by shape. There is a SUPER that calls the functions in the base class. Let's continue to look at its implementation.
SUPER destructor
-->
Invoke Circ_destructorPto PTR [(INHER_initdata + INHER. MethodName)], lpTHIS
Circ_destructorPto specifies the function of the address type. INHER is a global item in the macro, indicating the base class name of the class. The structure of INHER_initdata + INHER. MethodName is the actual address of the class in the base class.
The rest is actually using the functions in the object (you are "not authorized" to operate on the data in the object, although conceptual. In fact, you can destroy the OOP ideas embodied here. Because compilation does not provide such protection ).
METHOD hCircle, Shape, getArea
-->
Mov edx, hCircle
Invoke (Shape PTR [edx]). getArea, edx
The harvest season is over. This sentence reflects the thought of polymorphism. HCircle points to an object of the Circle class, but it is interpreted as a Shape class when called. Understand it by yourself. Where is polymorphism.
--------------------------------------------------------------------------------
My opinion
From a global perspective, we can find this object model.
Object Data and virtual function pointers are placed in the same table.
All functions are virtual.
The virtual function of the inherited class to rewrite the base class must be manually completed after the data is initialized (in the constructor)
Only access to virtual functions modified in the previous base class
Use win32api to allocate and release memory
Support for the following three features of OOP:
Encapsulation: no special protection (no Private) is provided for the data ). Data and function pointers are placed in the same struct to become an object. The interfaces provided to access data are completely self-conscious.
Inheritance: The Structure Inheritance is completed through the nesting of struct definitions (the definition contains the defined struct. Use SET_CLASS to inherit data. All inheritance is Public.
Polymorphism: the narrow sense of polymorphism means that calls to the same function will have different behaviors. Through the following comparison, we can see why this object model supports polymorphism (because it supports the rewriting of the base class function by the derived class ).
Class Shape
{
Virtual float getArea ();
......
};
Class Cicle: public Shape
{
Float getArea ();
......
};
When you call a virtual function in an object through an object pointer. In fact, you have specified the pointer type during compilation. For example:
Float getArea (Shape * shp)
{
Return shp-> getArea ();
}
Therefore, the compiler can query the compile-time information to determine the index position of the function you call in vtbl. Then this call will be replaced by a query vtbl and then a call. When running, the sent pointer shp is not necessarily the Shape type, but its inheritance type. In this way, the vtbl content of the two classes may be different (the derived class overrides some slot addresses ). Therefore, the function of the derived class can be called without knowing what the derived class is. The secret is that the derived classes and the base classes place their respective implementation versions in the same location as vtbl. The location is determined during the compilation phase, and the content of this location is determined during the runtime.
What about the object model of this assembly version? In fact, it is similar. Mov edx, hCircle, and invoke (Shape PTR [edx]). getArea, edx is a multi-state call. HCircle actually points to a Circle type object (here, the object has data and undertakes the vtbl task ). This pointer is designed to be interpreted as a Shape class during the call. That is, it is called according to the index of getArea in the shape type. Polymorphism occurs when the same index is indexed to different functions.
--------------------------------------------------------------------------------
Possible improvements
Virtual Functions
In fact, to be honest, this object mode is really good, and the macro function is maximized. However, it requires that all classes have a virtual destructor and all functions are virtual. Within the range of macro capabilities, definitions and calls have been made as simple as possible, and they are indeed pleasing to the eye. However, I don't think putting all functions in an object (mandatory as a virtual function) will increase the cost.
C ++ regards non-virtual member functions as common functions and places them outside the objects. In fact, I think this object model can also be used. The existing mandatory limitation of some member function prototype names to "type name_function name Pto" is not as good as providing a macro.
My suggestion is to put some functions into the object (that is, to declare them in the class with CMETHOD ). Others are not put, but are written in the same file. Then, when the METHOD is used to call the member-function of an object, it determines whether the member-function exists in the virtual function table (that is, a series of function pointers stored by the object) during compilation. If not, it is called like a common function. If yes, it is called in the current mode.
The METHOD macro is also written. In the macro "SUPER", we can check whether the method appears in the base class for the first time (and also check whether it is the last layer rather than the multi-layer base class ). This METHOD can also check whether the called METHOD has appeared in the class, and then use different function call methods.
About SUPER
In this model, only the overwritten functions of the previous layer (that is, the parent class) can be called, and the overwritten functions of the previous layer cannot be SUPER. The obstacle is that you cannot know the type of the function that appears for the first time. If you manually provide this class name, you can overwrite the SUPER function at any level. Like this:
SUPER getArea
SUPER getArea, Shape
If no specific class name is provided, it is regarded as the top layer of SUPER. Otherwise, the specific class name will be used for SUPER. The secret of SUPER is to query the data in "type name_initdata" in the. data Segment to restore the function that has been rewritten.
The TABLE in TASM can make up for the OVERRIDE defect, which is a better STRUC. But that is the result of Borland's enhanced syntax. In any case, Thomas does this very well.