Linux compile and link knowledge __linux

Source: Internet
Author: User
Tags mixed

Linux compiles the HELLO.C program, uses GCC hello.c, and then./a.out can run; a number of complex processes are hidden behind this simple command, and this process includes the following steps

The macro definition expands, and all #define at this stage are processed by the precompiled command, including #if #ifdef type of command to expand #include files, like the stdio.h in the Hello World above, to merge all the code in stdio.h into hello Remove comments in. c

GCC precompilation is precompiled CPP, we can see the results of precompilation through the-e parameter, such as: Gcc-e hello.c-o hello.i generated hello.i is the result of precompilation in the precompiled process will not be too much check and precompiled-independent syntax (# Ifdef or something like that needs checking, #include文件路径需要检查), but for something like; The missing grammatical errors are not visible at this stage. As anyone who has written makefile knows, we need to add-ipath a series of parameters to indicate the lookup path to the GCC header file.

Small Tip :

1. In some programs caused by the macro of the compiler error, you can expand the macro by-E to check the error, this in the writing PHP extensions, Python extensions of these large number of need to use macros for error detection is helpful.

2. If you are in the header file, take the path #include time at this stage sometimes can save a lot of things, such as #include <public/connectpool/connectpool.h> So in the GCC-i parameter only need to specify a path, not due to careless result, the filename is exactly the same as the conflict of the trouble thing. With the path of the way to write more code, but also troublesome things, the path by the external designation is relatively flexible.

compiling

This process is the syntax analysis and lexical analysis of the place, they will our C/s + + code translation into assembly code, which is also a compiler the most complex place

Using the command gcc-s Hello.i-o Hello.s can see the assembly code compiled by GCC, the modern GCC compiler typically compiles and compiles together, using a CC1 program to complete the process, When compiling large files, you can use the top command to see that a CC1 process has been taking up time, when the program is executing the compilation process. The compilation process mentioned later refers to the CC1 processing including precompilation and compilation.

Assembly

Now that C + + code has become an assembly code, the compiler that uses the assembly code directly turns the assembly into a machine code (note that it is not executable). Gcc-c Hello.c-o hello.o Here hello.o is the last machine code, if as a static library to here can be completed, do not need the back of the process.

For static libraries, such as ullib, COM provides the LIBULLIB.A, where the. A file is actually multiple. O is packaged by the AR command, just for ease of use, throw away. A direct use. O is the same.

Small Tip :

1. GCC uses as for the assembly process, as due to the acceptance of GCC generated by the standard assembly, there are many flaws in the grammar check, if we write the assembly code to deal with, often there are many inexplicable wonderful mistakes.

Links

The process of linking is essentially a combination of all the machine code files into an executable file compiled with the result of one. o file, but this. o to generate two execution files on its own is not good, it also needs a bunch of auxiliary machine code to help it deal with the bottom of the system to deal with things. Gcc-o Hello hello.o

This will link an. o file to a binary executable file.

This place is also the focus of this article, which will be described in more detail later

Small Tip :

Some programs will be compiled with the "linker input file unused because linking does not do" hint (although GCC is not considered an error, this hint will appear), here is the compilation and link use parameters confused, such as g++ -C test.cpp-i ... /.. /ullib/include-l.. /.. /ullib/lib/-lullib This kind of writing will lead to the above prompts, because in the process of compiling is not required to link, their two process is actually independent

Static links

the link process here is a brief introduction to the work done by the linker

In fact, the work of the link is divided into two pieces: symbol resolution and relocation

Symbolic parsing

The symbols include the defined and referenced functions and variable information in our program.

Use NM./test on the command line

Test is a user's binary program, including

You can output the symbol table in a binary target file 00000000005009b8 a __bss_start 00000000004004cc t Call_gmon_start 00000000005009b8 b Completed.1 0000000000500788 D __ctor_end__ 0000000000500780 d __ctor_list__ 00000000005009a0 D __data_start 00000000005009a0 W data_start 0000000000400630 t __do_global_ctors_aux 00000000004004f0 t __do_global_dtors_aux 00000000005009a8 d __dso_handle 0000000000500798 d __dtor_end__ 0000000000500790 d __dtor_list__ 00000000005007a8 D _DYNA MIC 00000000005009b8 a _edata 00000000005009c0 a _end 0000000000400668 T _fini 0000000000500780 a __fini_array_end 0000000 000500780 A __fini_array_start 0000000000400530 t frame_dummy 0000000000400778 r __frame_end__ 0000000000500970 D _GLOBAL _offset_table_ w __gmon_start__ U __gxx_personality_v0@ @CXXABI_1.3 0000000000400448 T _init 0000000000500780 A __init_ Array_end ... Of course, the symbol table above the NM output can be removed by the compiler command, so that people can not see directly.

The linker resolves a symbolic reference by using each referenced symbol with its destination file (. o) is associated with the definition of a symbol in the symbol table for those and references that are defined in the same module as local symbols (note: static modified), the compiler can discover problems at compile time. But for those global symbolic references are more cumbersome.

Let's look at one of the simplest programs: #include <stdio.h> int foo (); int main () {foo (); return 0;} We name the file test.cpp and compile it in the following way

g++-C test.cpp g++-O test TEST.O

The first step ended normally, and the TEST.O file was generated, and the following error was reported in the second step

TEST.O (. text+0x5): In function ' main ':: Undefined reference to ' foo () ' Collect2:ld returned 1 exit status

Because Foo is a global symbol, no error occurs when compiling, and when the link is found, the corresponding symbol is not found, and the above error is reported. But if we change the wording above to the following

#include <stdio.h>//Note here the static static int foo (); int main () {foo (); return 0;}

In the run g++-C Test.cpp, immediately reported the following error:

Test.cpp:19:error: ' int foo () ' used but never defined

In the compiler found that Foo could not generate the target file symbol table, you can immediately error, for some local use of the function of static can avoid symbolic pollution, on the other hand can let the compiler to find errors as soon as possible.

Provided in the underlying library are a series of. A files, which are actually the packaged results of a batch of target files (. o). The goal is to make it easy to use the results of existing code generation, typically a. c/.cpp file generates an. o file, which, when compiled, is very inconvenient with a bunch of. o files, like this:

g++-o main main.cpp a.o b.o C.O

Such a large amount of use. O is also prone to error, using archive in Linux for these. O Archiving and packaging.

So we can write the compilation parameters

g++-o main main.cpp./LIBULLIB.A

We can use the./LIBULLIB.A to use the LIBULLIB.A library directly, but GCC provides another way to use it:

g++-o main main.cpp-l./-lullib

-l Specifies the path to the library file that needs to be looked up, and-L selects the name of the library that needs to be used, but the name of the library needs to be named in Lib+name to be recognized by GCC. The problem with this approach, however, is that it does not distinguish between dynamic libraries and static libraries, a problem that will be mentioned later when you introduce a dynamic library.

This is complicated when there are multiple. A and dependencies between libraries.

If you want to use Lib2-64/dict, Dict also relies on ullib, which needs to be written in a form similar to the following

g++-o main main.cpp-l ... /lib2-64/dict/lib-l.. /lib2-64/ullib/lib-ldict-lullib

-lullib needs to be written in the back of-ldict, which is due to the fact that in the default case the parsing and lookup of the symbol table is done backwards (the internal implementation is a similar stack of tail recursion). So when the library in use itself has dependencies, the more basic the library needs to be put behind. Otherwise, if the above put-ldict-lulib position change, may appear undefined reference to XXX error.

Of course GCC offers another way to solve this problem.

g++-o main main.cpp-l ... /lib2-64/dict/lib-l.. /lib2-64/ullib/lib-xlinker "-("-ldict-lullib-xlinker "-)"

We can see that the libraries we need are-xlinker "-(" and-xlinker "-)" included, GCC loops automatically to find dependencies when it is processed here, but the cost is to extend the GCC compilation time, and if the library is used very often, the time-consuming impact on compilation is significant .

-xlinker sometimes writes "-WL," which means that the parameters behind it are used for the linker. The difference between-xlinker and-WL is that a trailing argument is with a space, and the other is using ","

We look at the target file through the NM command and we see a result similar to the following

1 0000000000009740 T _z11ds_syn_loadpcs_
2 0000000000009c62 T _z11ds_syn_seekp16sdict_search_synpcs1_i
3 0000000000007928 T _z11dsur_searchpcs_s_
4 &nbs p; U _Z11UL_READFILEPCS_PVI
5 &nbs p; U _z11ul_writelogipkcz
6 00000000000000a2 T _z12creat_sign32pc

The symbol _Z11UL_READFILEPCS_PVI (in fact, the ul_readfile in Ullib) marked with U indicates that the Ul_readfile function is not found in the Dict target file.

At the time of the link, the linker will look for _Z11UL_READFILEPCS_PVI symbols in other target files.

Small Tip :

Compile the use of-lxxx-lyyy in the form of libraries,-L and-L This parameter does not match the relationship, some of our makefile in order to maintain the convenience of their written in the form of pairing, resulting in misunderstanding. In fact, can be written-lpath1,-lpath2,-lpath3,-llib1 such a form.

At the time of the specific link, GCC is in the. o file, and when compiled, if you write g++-o main main.cpp libx.o then all symbols in the main.cpp that are used in LIBX.O,LIBX.O are loaded into the Mian function. But if it is for. A, write g++-o main main.cpp-l./-lx, this time GCC will only be linked when the link is used. O, If no one of the. o Files in libx.a appears to be used by main, then this. O is not linked to main

reposition

After the above symbol parsing, all the symbols can find its corresponding actual position (the link represented by U find the specific symbol position).

When the as assembly generates a target module, it does not know where the data and code are at the last concrete location, nor does it know the exact location of any externally defined symbols, so as when generating the target code, it generates a relocation table for symbols that are unknown in position. Tell the linker how to modify the address to the final location when merging the target file into an executable file

g++ and GCC

The notation produced at compile time with GCC and g++ is different.

In C + + because in order to support function overload, namespace, and other features, g++ will function + parameters (possibly also namespace), the function of life into a special and unique symbol name. For example:

int foo (int a);

After GCC is compiled, the name in the symbol table is the function name Foo, but after the g++ compiled name may become _z3fooi, we can use the c++filt command to restore a symbol to its original appearance, such as

C++filt _z3fooi

Run the result can get foo (int)

Because in C + + and pure c environment, the symbol table has the incompatibility problem, C program cannot directly call C + + compiled library, C + + program can not directly call C compiled library. In order to solve this problem C + + introduced the extern "C" way.

extern "C" int foo (int a);

In this way, when compiling with g++, the C + + compiler automatically converts the int foo (int a) above to the interface of C. This will recognize these symbols in pure C.

However, there is a problem, the extern "C" is supported by C + +, GCC does not know, all in practice generally in the following way to use + +

#ifdef __cplusplus extern "C" {#endif int foo (int a); #ifdef __cplusplus} #endif

So the interface in this header file can be used for GCC or g++, of course, the interface in extern "C" {} is an attribute that does not support overloading, default parameters, etc.

In our 64-bit compilation environment, if a program with GCC uses the g++ compiled library in the above way, we need to add-lstdc++ because, for our 64-bit environment g++ compiled libraries, we need to use a __gxx_personality_v0 symbol, It is located in the/usr/lib64/libstdc++.so.6 (c + + standard library iostream are inside, C + + programs need). However, in the 32-bit 2.96 g++ compiler there is no need for __gxx_personality_v0, and all compilation can be without-lstdc++

Small Tip :

In Linux gcc, only if the source code uses the. c suffix, and compiles with GCC to be compiled into pure C, other situations like g++ compiled. c files, or GCC compiles. cc,. cpp files are compiled as C + + programs into C + + target files, gcc and g++ The only difference is that GCC does not actively link-lstdc++ in extern "C" {} If there is an interface with a default parameter, the problem will not occur when the g++ is compiled, but GCC will complain when it is used. Because for function overloading, the symbol table for the interface is the same as when the default argument is not used.

Symbol table Conflict

When compiling a program, you will often encounter the same

Multiple definition of ' foo () '

of errors.

These errors are caused by the presence of the same symbols in the. o file used.

Like what:

Libx.cpp

int foo () {return 30;}

Liby.cpp

int foo () {return 20;}

Will Libx.cpp, Liby.cpp compiled into LIBX.O and liby.o two files g++-o main main.cpp LIBX.O this time will quote LIBY.O definition of ' foo () ' Error (some of the parameters Number can turn this alarm off)

But if you pack LIBX.O and liby.o separately into libx.a and liby.a, compile g++-o main main.cpp-l./-lx-ly this time, the compilation does not error, and it chooses the first library that appears. The above example will select Foo in LIBX

You can see the symbols by g++-o main main.cpp-l./-lx-ly-wl,--trace-symbol=_z3foov commands are specifically linked to which library.

g++-o main main.cpp-l./-lx-ly-wl,--cref can output all symbolic links (whether or not they are used last)

Small Tip :

For some global constants that are defined in the header file, GCC and g++ behave differently, and the const in g++ is also static, but GCC is not

For example: there is a const int intvalue = 2000 in foo.h;

The Global constants

There are two libraries A and b, they are in the generation of the use of intvalue, if there is a program main used in both A and B library, the link when GCC compiled results will be an error, but if A and B are g++ compiled words results are all normal.

This reason is mainly in the g++ will intvalue this const constant as static, so is a local variable, does not cause conflict, but if the GCC compiled, this place intvalue will be considered as an external global constant is not static, This time it will cause link errors

Dynamic Link

For the use of static libraries, there are two questions

When we need to update a library, we have to get an executable file to do some recompilation. The code is loaded into the machine's memory when the program is running, and if you use a static library, a library needs to be copy to multiple memory programs, which takes up a certain amount of memory, On the other hand, the CPU cache is not friendly enough to control the link, from the previous introduction can see the static library connection behavior we are not good control, do not have the runtime to replace the use of the library after the compiled program is binary code, some code they involve different machines and environment, assuming that in a A program x is compiled on the machine, put it directly to the B machine to run, because A and b environment differences, directly run X program may have problems, this time if the machine-related part of the dynamic library C, and to ensure that the interface consistent, compile X program when only the external interface of C. For general user-state x programs, it is easy to put the environment from A to B. But if it is statically compiled, you may not be able to do this, you need to recompile on the B machine.

The dynamic link library is called a shared library in Linux (Shared library, which is referred to in the following common libraries and dynamic link libraries), and is primarily designed to address the disadvantages of the static libraries listed above.

Use of shared libraries

There are two main ways to use a shared library. A static library like. A is controlled by the compiler, which is loaded by the loader (ld-linux.so) in the system as well as the binary, and is written in code, controlled by our own code.

As an example of the previous example: g++-shared-fpic-o libx.so Libx.cpp compiled with a static library, just add-shared and-fpic, and change the output name to. So

Then, as with the executable file link. A, all are g++-o main main.cpp-l./-lx so that main is calling libx.so, and there may be no libx.so error at run time, because of the dynamic library lookup path problem, The default lookup path for dynamic libraries is specified by the/etc/ld.so.conf file, and when you run the executable, you go to these directories to find the shared libraries that you need. We can specify the lookup path of the shared library through the environment variable Ld_library_path (Note: Ld_library_path priority is higher than ld.so.conf).

command to run LDD./main we can see the dynamic libraries that this binary program needs to run at runtime, for example:

libx.so =>/home/bnh/tmp/test/libx.so (0x003cb000) libstdc++.so.6 => (/usr/lib/libstdc++.so.6) Libm.so.6 =>/lib/tls/libm.so.6 (0x00bde000) libgcc_s.so.1 => (/lib/libgcc_s.so.1) 0x00c3e000 libc.so.6-=> /tls/libc.so.6 (0x00aab000)

Here is a list of the dynamic libraries required by Mian, if there are errors that look like libx.so=>no found, it means that the path is wrong and you need to set up Ld_library_path to specify the path

Manually load a shared library

In addition to using dynamic libraries in the form of static libraries, we can also control the use of dynamic libraries by code.

This allows applications to load and link shared libraries at run time, mainly with the following four interfaces

Load dynamic link library void *dlopen (const char *filename, int flag); Gets the symbol void *dlsym (void *handle, Char *symbol) in the dynamic library; Turn off dynamic link library void dlclose (void *handle); Output error Information const char *dlerror (void);

Look at the following example: TypeDef int foo_t (); foo_t * foo = (foo_t*) dlsym (handle, "foo");

In the way above we can load the address of the symbol "foo" and then convert it to a function pointer by forcing the type, of course, where the type of the function pointer needs to be consistent with the prototype type of the symbol, which is typically provided by the header file corresponding to the shared library.

Here's a question to note, the symbolic representation that is loaded in the dlsym is consistent with the notation table we see Using the NM library file, there is a difference between the GCC and g++ notation tables mentioned earlier, an int foo (), if it is g++ compiled, and there is no extern "C" Export interface, Then the use of Dlsym load with dlsym (handle, "_z3foov") way to load the function int foo (), so we recommend that the external interface of the shared library using extern "C" to export the pure C interface for external use, so in the use will be more convenient

Dlopen's flag logo can be selected Rtld_global, Rtld_now, Rtld_lazy. Rtld_now, Rtld_lazy just means that the loaded symbol is loaded at the start and is not in use until it is used, and has no particular impact on most applications. Both of these signs are available through | Together with Rtld_global.

Here is the main description of the function of Rtld_global, consider such a situation:

We have a main.cpp that calls two dynamic Liba, and Libb, assuming that an external interface in a is called Testa, main.cpp can be used by dlsym to get pointers to Testa. But for the interface in Libb, it is to see the Liba interface, the use of Testa can not call to the Testa in Liba, but if the Dlopen open liba.so, set the option of Rtld_global, The interface in liba.so can be upgraded to global visibility so that Testa in Liba can be called directly in Libb, and if there are the same symbols in multiple shared libraries and have rtld_global options, the first one will be preferred.

Also noted here is a problem, Rtld_global makes the external interface between the dynamic library is visible, but the dynamic library can not call the main program in the global symbol, in order to solve this problem, GCC introduced a parameter-rdynamic, When the executable program that loads the shared library is translated, it is finally added-rdynamic to the link. Makes all the symbols in the executable global visible, the dynamic library it loads in the runtime can call the global symbol in the main program directly, and if the shared library ( You or another shared library, Rtld_global, has a symbol with the same name, and you choose the symbols used in the executable, which in some cases may lead to some inexplicable run-time errors.

Small Tip :

/usr/sbin/lsof-p PID allows you to view all shared library shared libraries that are loaded by PID during the runtime, either through Dlopen loading or loader loading, and essentially mapping shared libraries to memory space in a mmap way. The mmap parameter map_denywrite can prevent changes to the memory data when the modification has been loaded into a process file, because the kernel has already disabled this parameter, the result is that if the Mmap file is modified, This time the changes will be directly reflected in the space that has been mmap mapped. Because the kernel is not supported, so that the shared library can not be hot in the runtime, the shared library needs to be updated by the loading of the program through some external means to determine, active use of dlclose, and Dlopen reload the shared library, if the loader load then need to restart the program. In addition here the enthusiastic change refers to the direct copy cover the original shared library, if the use of MV or soft connection of the way is still safe, shared library by MV will not affect the original has been loaded into its program. g++ plus the-rdynamic parameter essentially corresponds to the LD link with the-e or--export-dynamic parameters, the effect is the same as the g++-wl,-e or g++-wl,--export-dynamic.

Mixed compilation of static libraries and dynamic libraries

Mixed use of static libraries and dynamic libraries, there are often some strange errors, the use of the need for attention

As a general matter, as long as there is no dependency between the static library and the shared library, no global variables (including static variables) are used, and there are not too many problems, the following are examples of issues to use to illustrate the considerations.

Reprint Address: http://hi.baidu.com/herejing/blog/item/ca31c8d6d0d9f12206088b0b.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.