11.4 global structure and analysis of c ++
In the world of C ++, entry functions shoulder another arduous mission: To construct and analyze global variables before and after main. This section describes how to complete glibc and msvcrt.
11.4.1 global structure and analysis of glibc (1)
The Startup File of glibc has been described earlier ". init "and ". finit "section. We know that the code in these two sections will eventually be assembled into two functions: _ Init () and _ finit (). These two functions will be executed before/before the main function. But when are they executed? Who calls them? How do they construct and analyze global objects? To solve these problems, this section will continue to explore the line starting from the _ Start entry function in Section 1 of this chapter and find the answers to these questions.
For ease of expression, the following code is used to compile executable files for analysis:
class HelloWorld { public: HelloWorld(); ~HelloWorld(); }; HelloWorld Hw; HelloWorld::HelloWorld() { ...... } HelloWorld::~HelloWorld() { ...... } int main() { return 0; } |
To understand the structure details of global objects, it is necessary to study the Startup Process of the program further. In section 1 of this chapter, what does the init function pointer passed by _ start point? By tracking the address, init actually points to the _ libc_csu_init function. This function is located in the CSU/Elf-init.c of the glibc source code directory. Let's take a look at the definition of this function:
_start -> __libc_start_main -> __libc_csu_init: void __libc_csu_init (int argc, char **argv, char **envp) { … _init (); Const size_t size = _ init_array_end- _ Init_array_start; For (size_t I = 0; I <size; I ++) (* _ Init_array_start [I]) (argc, argv, envp ); } |
This Code calls the _ init function. So what is _ Init? Do you think of The _ Init () function defined in crti. O as described earlier? Yes, __libc_csu_init calls the ". init" section. That is to say, all the Code put in the ". init" section will be executed here.
Here, it seems that our clues are broken, because the actual content of the "_ init" function is not defined in glibc, which is defined by ". the init "section is pieced together. However, in addition to analyzing the source code, there is also an ultimate goal: to disassemble the target code. We can disassemble an executable file to find the content of the _ Init () function:
_start -> __libc_start_main -> __libc_csu_init -> _init: Disassembly of section .init: 80480f4 <_init>: 80480f4: 55 push %ebp 80480f5: 89 e5 mov %esp,%ebp 80480f7: 53 push %ebx 80480f8: 83 ec 04 sub $0x4,%esp 80480fb: e8 00 00 00 00 call 8048100 <_init+0xc> 8048100: 5b pop %ebx 8048101: 81 c3 9c 39 07 00 add $0x7399c,%ebx 8048107: 8b 93 fc ff ff ff mov -0x4(%ebx),%edx 804810d: 85 d2 test %edx,%edx 804810f: 74 05 je 8048116 <_init+0x22> 8048111: e8 ea 7e fb f7 call 0 <_nl_current_LC_CTYPE> 8048116: e8 95 00 00 00 call 80481b0 <frame_dummy> 804811b: e8 b0 6e 05 00 call 809efd0 <__do_global_ctors_aux> 8048120: 58 pop %eax 8048121: 5b pop %ebx 8048122: c9 leave 8048123: c3 ret |
We can see that _ init calls a function named _ do_global_ctors_aux. If you look for this function in the glibc source code, it is impossible to find it. Because it does not belong to glibc, it comes from a target file crtbegin. O provided by GCC. We also introduced in the previous section that some of the final links of the linker are from GCC, which are language-related support functions. Obviously, the global object structure of C ++ is closely related to the language, and the corresponding constructor function is easy to understand from GCC.
Even if it is in the source code of GCC, we pull it out. It is located in gcc/crtstuff. C, and the code is simplified as follows:
_start -> __libc_start_main -> __libc_csu_init -> _init -> __do_global_ctors_aux: void __do_global_ctors_aux(void) { /* Call constructor functions. */ unsigned long nptrs = (unsigned long) __CTOR_LIST__[0]; unsigned i; for (i = nptrs; i >= 1; i--) __CTOR_LIST__[i] (); } |
The above code first regards the first element of the _ ctor_list _ array as the number of array elements, then treats the elements after the first element as function pointers, and calls them one by one. The intention of this code is very obvious. We can all guess what is stored in _ ctor_list _. That's right, __ctor_list _ contains pointers to constructors of all global objects. Then the next focus is obviously _ ctor_list _. How does this array come from? Who is responsible for constructing this array?
Here, we have to temporarily put down the history of _ ctor_list _. It is boring to investigate _ ctor_list _ from the GCC aspect. We may wish to look at the other end of the problem, that is, from the perspective of how the compiler produces global constructor, we can see how the global constructor is implemented.
For each compilation unit (. CPP). The GCC compiler will traverse all global objects and generate a special function. This special function is used to initialize all global objects in the compilation unit. We can disassemble the code starting with this section to obtain some rough information. We can see that GCC generates a function named _ global1_ I _ HW in the target code, this function is responsible for the construction and analysis of all global/static objects in this compilation unit. Its code can be expressed:
Static void global1_ I _ HW (void) { HW: HW (); // construct an object Atexit (_ tcf_1); // a mysterious function called _ tcf_1 is registered with exit. } |
11.4.1 global structure and analysis of glibc (2)
The mysterious function _ tcf_1 will be discussed at the end of this section. Globalbench I _ HW, as a special function, also enjoys special treatment. Once a target file contains such a function, the compiler will generate the target file (. o ". place a pointer in the ctors section, which points to global1_ I _ HW.
So what are the advantages of placing the complex global/static object constructed function addresses of each target file in a special segment? Of course, this is not the case, so that the linker can collect these special segments and collect all the global constructors before they can be constructed during initialization.
After the compiler generates a special function for each compilation unit, the linker merges the segments with the same name when connecting these target files. the ctors segment will be merged into one. ctors section, where the content is of each target file. the memory of the ctors segment is spliced. Because. the ctors section only stores one pointer (the Global constructor pointing to the target file), so it is spliced. the ctors segment becomes an array of function pointers. Each element points to the global constructor of a target file. Isn't this pointer array exactly the address list of the global constructor we want? If we can get the address of this array, isn't the construction problem solved?
That's right. It's not difficult to get the address of this array. We can also piece together the ". init" and ". finit" sections by following the steps above. Remember to link a crtbegin. O and crtend. O respectively before and after the target file generated by each user during the link? The two glibc target files also have. ctors segments. When linked, the content of the. ctors segments of these two files will also be merged into the final executable file. So what are in the. ctors section of these two files?
Crtbegin. o: As all. the beginning of the ctors segment, crtbegin. o's. the ctor segment stores a 4-byte 1 (0 xffffffff), and the linker is responsible for changing this number to the number of global constructors. Then, this segment defines the starting address as the symbol _ ctor_list __. in fact, _ ctor_list _ represents the starting address after the final merging of all. ctor segments.
Crtend. O: The. ctors content in this file is simpler. Its content is 0, and a symbol _ ctor_end __is defined, pointing to the end of the. ctor segment.
As described in the previous section, when the linker links a user's target file, crtbegin. O is always in front of the user's target file, and crtend. O is always behind the user's target file. For example, link two users' target file. O and B. o, the target file actually linked will be (in order) LD crti. O crtbegin. o. o B. O crtend. O crtn. o. Here we ignore crti. O and crtn. O because the two target files are irrelevant to the global structure. When you merge crtbegin. O, user target file, and crtend. O, the linker Concatenates the. ctors segments of these files in sequence, so the process of forming the. ctors segment is shown in 11-10.
|
(Click to view the big picture) Figure 11-10. ctor segment Formation |
After learning about the structure of the. ctors segment of the executable file, it is easy to look back at the code of _ do_global_ctor_aux. _ Do_global_ctor_aux starts from the next position of _ ctor_list _ and executes the function pointer in sequence until null (_ ctor_end _) occurs __). So that the global constructor of each target file can be called.
Small lab]
Call the function before Main:
The global constructor of glibc is placed in. in the ctors segment, so if we manually. add some function pointers in the ctors Section so that these functions can be called during global construction (before Main:
#include <stdio.h> void my_init(void) { printf("Hello "); } Typedef void (* ctor_t) (void ); // Add a function pointer in the. ctors Section Ctor_t _ attribute _ (Section (". ctors") my_init_p = & my_init; Int main () { Printf ("world! /N "); Return 0; } |
If you run this program, the result is: Hello world!
Of course, in fact, there is a more direct way in GCC to achieve the same goal, that is, using _ attribute _ (constructor ))
Example:
#include <stdio.h> void my_init(void) __attribute__ ((constructor)); void my_init(void) { printf("Hello "); } int main() { printf("World!/n"); return 0; } |
11.4.1 global structure and analysis of glibc (3)
Structure Analysis
For early glibc and GCC, after the object is constructed, the CRT also needs to analyze the object before the program ends. In fact, the normal global object structure is similar to the previous structure in the process, and all functions and Symbol names correspond one by one, such ". init "changed ". finit "and" _ do_global_ctor_aux "are changed to" _ do_global_dtor_aux "and" _ ctor_list _ "to" _ dtor_list. As we can see in the preceding entry function, __libc_start_main registers "_ libc_csu_fini" with _ cxa_exit () to the exit list. In this way, exit () before the process exits () it will call "_ libc_csu_fini ". The principle of "_ fini" is basically the same as that of "_ init". I will not go into details here.
However, in order to ensure the global object structure and the structure sequence (that is, the structure is constructed first and then the structure is parsed), the linker must encapsulate all ". the merging sequence of dtor segments must be ". ctor "strict Reverse Order, which increases the workload of the linker, so people later gave up this approach and adopted a new approach, that is, through _ cxa_atexit () in exit () the function registers the process to exit the callback function to implement the structure.
This will return to the mysterious function we saw in the global constructor global1_ I _ HW () of each compilation unit. The compiler generates a special function for the Global Object of each compilation unit to call the destructor of all global objects of this compilation unit. The Calling sequence is consistent with that of global1_ I _ HW () the order of calling constructor is the opposite. For example, for the code in the previous example, the so-called mysterious function generated by the compiler is roughly as follows:
Static void _ tcf_1 (void) // This name is generated by the compiler. { HW .~ Helloworld (); } |
This function is used to analyze the HW object. Because we have registered _ tcf_1 through _ cxa_exit () in global1_ I _ HW and passed _ cxa_exit () the order in which the registered function is called when the process exits satisfies the attributes that are first registered and then called, which is exactly consistent with the order of construction and analysis, so it is naturally used for the implementation of destructor.
Of course, this section introduces the Global Object Construction and Analysis structure of glibc/GCC, omitting a lot of details that we think are beyond the scope of detail to be emphasized in this book, the real construction and analysis processes are more complex than the ones described above, and the construction and analysis structures are slightly different when Dynamic and Static links are different. However, in either case, the basic principles are the same. According to the steps and paths described above, I believe that readers can repeat the call route based on the actual situation.
Because Global Object Construction and analysis are completed by the Runtime Library, you cannot use the "-nonstartfiles" or "-nostdlib" option when there are global objects in the program or shared library, otherwise, the build and destructor will not be executed normally (unless you are clear about your behavior and manually construct and analyze global objects ).
Collect2
We once met the collect2 program in Chapter 2nd. It replaces LD as the final linker during the link. In general, we can simply regard it as the lD. In fact, collect2 is a pack of LD. It finally calls LD to complete all the link work. So what is the role of collect2?
In some systems, Assembler and linker do not support ". init "". to implement code execution before the main function, special processing must be performed at the link. Collect2 is used to implement this function, it will "collect" (collect) all the special symbols in the input target file, these special symbols indicate that they are global constructors or executed before Main. collect2 will generate a temporary one. the C file collects the addresses of these symbols into an array and then places them in this. in the c file, it is linked to the final output file together with other target files after compilation.
On these platforms, the GCC compiler will also generate a _ main function call at the beginning of the main function, which is actually the functions collected by collect2. The _ main function is also part of the target file provided by GCC. If we use "-nostdlib" to compile the program, the _ main function may be undefined, in this case, you just need to add "-lgcc" to link it.