Strange, isn't it? Many people are using C ++. But few people really care about how C ++ compile is implemented.
Jan gray once wrote an article in 1994 called C ++ under the hood, introducing visual
C ++ implementation details. This guide is based on Jan. At the same time, I will explain in detail what is hard to understand in Jan. I hope this guide will help more people understand the underlying architecture of C ++.
Implementation mechanism.
The layout of a class
Struct B {
Public:
Int bm1;
Protected:
Int bm2;
PRIVATE:
Int Bm3;
};
What is the layout of struct B in memory? Visual c ++ ensures that the layout in memory of member variables in B is consistent with the order of their life. Struct B should look like this in memory layout:
Single inheritance
Struct c {
Int C1;
Void CF ();
};
Struct D: c {
Int D1;
Void DF ();
};
In Visual C ++, ensure that the position of the member variables in C is always in the starting position of D. Like this:
The advantage of this is that when C * Pc = new D (); Visual C ++ does not need to perform additional displacement conversion for the PC. Address equal D * Pd = new D (); in PC.
Multiple inheritance
Complex:
Struct E {
Int E1;
Void ef ();
};
Struct F: C, E {
Int F1;
Void ff ();
};
Multi-inheritance is complex, and their base and derived pointer locations are no longer the same.
F;
// (Void *) & F = (void *) (C *) & F;
// (Void *) & F <(void *) (E *) & F;
You can see more clearly through the following digrams of layout:
Why is C above E in the figure? This is the convention of Visual C ++. The layout correspond of the base class in the memory is in their Declaration Order. Because c Declaration is in front of E, we can see that F is in memory layout.
By
The figure shows that the PE and PC in E * Pe = new F () and C * Pc = new F () point to different memory locations.
For example, compiler does not need to do anything else, but for PE, compiler needs to call displacement to point it to the location of E in the memory.
.
Virtual inheritance
Consider this situation:
Struct employee {...};
Struct MANAGER: employee {...};
Struct worker: employee {...};
Struct middlemanager: Manager, worker {...};
Undoubtedly, according to our previous descriptions, the layout in the memory of middlemanager should be like this:
There are two employee instances in the memory. How can we ignore the small number of employee instances?
Is there any way for the manager and worker to share the same instance in the memory? This is virtual
Inheritance issues to be resolved.
Before enjoying this optimized service, you should program your class architecture as follows:
Struct employee {...};
Struct MANAGER: Virtual employee {...};
Struct worker: Virtual employee {...};
Struct middlemanager: Manager, worker {...};
That is, How intuitive it is to add the virtual keyword before the base class to be sharing.
Struct G: Virtual c {
Int G1;
Void GF ();
};
Struct H: Virtual c {
Int H1;
Void HF ();
};
Struct I: G, H {
Int I1;
Void _ if ();
};
Then your class should look like this in the memory:
In vbptr, the relative displacement of the employee is stored.
Data member access
In the absence of inheritance:
C * PC;
PC-> C1; // * (PC + dcc1 );
C1 access is similar to * (PC + displacement of C1 within C). In this example, based on the definition of Class C and diplacement of layout, we can find that displacement = 0.
In the case of single inheritance:
D * PD;
Pd-> C1; // * (PD + DDC + dcc1.); // * (PD + ddcc1 );
Pd-> d1; // * (PD + ddd1 );
We can see from our previous diplacement that Pd-> C1 = * (PD + displacement from D to C + displacement from C to C1). In this case, displacement = 0.
Pd-> d1 = * (PD + displacement from D to D1). In this case, displacement = 4.
In multi-inheritance, the situation is a little more complex, but all displacement is still a constant (constant ).
F * PF;
PF-> C1; // * (PF + DFC + dcc1.); // * (PF + dfc1 );
PF-> E1; // * (PF + DFE + dee1); // * (PF + dfe1 );
PF-> F1; // * (PF + dff1 );
I want to easily calculate every displacement based on our previous diplacement.
What about virtual inheritance?
I * PI;
Pi-> C1; // * (PI + digvbptr + (* (PI + digvbptr) [1] + dpc3 );
Pi-> G1; // * (PI + dig + dgg1); // * (PI + dig1 );
Pi-> H1; // * (PI + DIH + dhh1); // * (PI + dih1 );
Pi-> I1; // * (PI + dii1 );
I;
I. C1; // * (& I + IDIC + dcc1.); // * (& I + idic1 );
Access to G1, H1, and I1 is easy to understand. I want to talk about access to C1.
Pi-> C1
Is a dynamic access. In runtime, the compiler does not know what the true type of PI is. In this case, we need to use vbptr, (* (PI +
Digvbptr) [1] indicates that in a specific vbptr (whether vbptr belongs to G or H ),
Class offset address. As for why (* (PI + digvbptr) [1] instead of (* (PI +
Digvbptr) [0]. I guess this is also the design of Visual C ++. If you know (* (PI +
Digvbptr) [0] in what, please let me know
For access to I. C1, because it is a static access, in order to save the overhead, C ++ directly handles it. The reason why C ++ dares to do this is that displacement of I is fixed in this static statement.
Casts
After understanding the above concepts, I believe that casts between 2 types is not a problem. Here are some common implementation methods of cast in Visual C ++.
For multiple inheritance:
F * PF;
(C *) PF; // (C *) (PF? PF + DFC: 0); // (C *) PF;
(E *) PF; // (E *) (PF? PF + DFE: 0 );
For virtual inheritance:
I * PI;
(G *) PI; // (G *) PI;
(H *) PI; // (H *) (Pi? PI + DIH: 0 );
(C *) PI; // (C *) (Pi? (PI + digvbptr + (* (PI + digvbptr) [1]): 0 );
What? Let's take a look at my description of data member access.
Member Functions
Struct P {
Int P1;
Void PF (); // new
Virtual void pvf (); // new
};
The access to a non-static member variable should be like this (I think most programmers will understand it) when member function is called, a this pointer is passed in. Its type is:
Type X * Const. (Has someone ever wondered why such a statement is not const Type X * const or const Type X?
For example
If it is declared as const Type X *, we will not be able to modify member variables through this pointer. For const Type X *
Const actually, when you define pf as: void PF () const; then the incoming this is const Type X *
Const. With Type X * const, we cannot modify this pointer itself without authorization. If you don't believe it, try it .)
Therefore, the calling of PF should look like this:
Void P: PF () {// void P: PF ([p * const this])
+ + P1; // ++ (this-> P1 );
}
Overriding member functions
Consider the following statement:
Struct Q: P {
Int Q1;
Void PF (); // overrides P: pf
Void QF (); // new
Void pvf (); // overrides P: pvf
Virtual void qvf (); // new
};
Overridden member functions include static and dynamic calls. Use the virtual keyword in C ++ to differentiate.
Scenario 1: static resolution:
When a member function is rewritten and is not virtual, its call is determined at compiling.
P; p * PP = & P; Q; p * ppq = & Q; Q * PQ = & Q;
PP-> PF (); // PP-> P: PF (); // P: PF (PP );
Ppq-> PF (); // ppq-> P: PF (); // P: PF (ppq );
PQ-> PF (); // PQ-> q: PF (); // Q: PF (P *) PQ );
PQ-> QF (); // PQ-> q: QF (); // Q: QF (PQ );
When PP-> PF ()
And ppq-> PF (), the pointer type that calls them is inserted in compiling. Because there is no virtual
Visual c ++ will be faithful to the type on the left of the-> operator and pass this type as this to this function.
Scenario 2: dynamic resolution:
PP-> pvf (); // PP-> P: pvf (); // P: pvf (PP );
Ppq-> pvf (); // ppq-> q: pvf (); // Q: pvf (Q *) ppq );
PQ-> pvf (); // PQ-> q: pvf (); // Q: pvf (P *) PQ );
How will the poor C ++ compiler determine the type of overridden member function? To solve this problem, vfptr is introduced.
It is usually inserted in the first position of memory layout and points to the vftable of this class. Vftable stores all virtual functions addresses. Like this:
When the subclass overrides the parent class method, the corresponding entry in vftable should be rewritten,
C ++ implements the dynamic resolution of overridden member function in this way.
Virtual functions: multiple inheritance
This is the most exciting and interesting part of this guide. I will introduce the famous thunk technology to you.
Consider the following situation:
Struct r {
Int R1;
Virtual void pvf (); // new
Virtual void RVF (); // new
};
Struct s: P, r {
Int S1;
Void pvf (); // overrides P: pvf and R: pvf
Void RVF (); // overrides R: RVF
Void Svf (); // new
};
How should I draw a layout like this? I guess it is like this:
S; S * PS = & S;
(P *) PS)-> pvf (); // (P *) PS)-> P: vfptr [0]) (S *) (P *) PS)
(R *) PS)-> pvf (); // (R *) PS)-> r: vfptr [0]) (S *) (R *) PS)
When
For the above two types of calls, I expected the function semantics to be the same as that after each function annotation. After all-> there is an "S" on the left of the operator, right?
The member function pointer should also be S *. When P * is used, the problem is very simple. p * and S * point to the same memory address, C ++
Compiler does not need to do anything. However, when R * is used, the memory address pointed to by R * and S * are different. Then we need to use some tips to convert the R * pointer to S *. For this
The solution to the problem is basically to use a technology called thunk. Override entry of pvf within vftable.
There are many rewrite methods. The result after rewriting in VC ++ is as follows:
S: pvf-adjust: // MSC ++
This-= sdpr;
Goto s: pvf ()
Well, it's easy, right? Just place this pointer to R *-displacement of S from R and then jump it to the real S: pvf () function address.
Constructors and Destructors
Constructor and destructor are common but cannot be used. Compiler usually splits it into multiple structures.
After the constructor is decomposed, it should be like this:
1) initialize vbptr for a most derived class and call the constructor of virtual base.
2) call the non-virtual base classes constructor.
3) Call the constructor of data members
4) initialize vfptr.
5) execute the code written by the user in constructor.
After destructor is decomposed, it should be like this:
1) initialize vfptr
2) run the code in destructor.
3) Call the data member destructor. The order is the opposite of the Order declared by data member in the class.
4) call the non-virtual bases destructor in the opposite order of declaration.
5) for a most derived class, call its virtual base destructor.