Research on the compiling link mechanism of TCC and Tlink

Source: Internet
Author: User
Tags compact

1. Learning process

Set up folder C under \ \ and put the compiler tcc.exe, connector Tlink.exe, related files c0s.obj, cs.lib, Emu.lib, maths.lib into the folder.

To build a simple C-language compiler environment, you need TC2.0, C0s.obj, Emu.lib, Maths.lib, Graphics.lib, and cs.lib files. And here with compiler Tcc.exe, connector Tlink.exe instead of TC2.0, and related files are also less graphics.lib, why so also can? Let's try to compile the connection file in a newly created environment:

Generate Files Hello.obj and hello.exe with the command "TCC hello.c" compile connection:

The result of the operation is:

In the process of compiling the connection, the first time I use the TCC hello.c compile, found that only the Hello.obj file was generated, and then I use the Tlink hello.obj connection to generate Hello.exe files, the results found that running this EXE file will be an error. After inspection, I found that the relevant files in the Maths.lib mistakenly copied to Mathl.lib, may be connected to the file error occurred, resulting in an EXE file error.

Then this environment compared with the previous environment, less TC2.0, graphics.lib, more Tcc.exe, Tlink.exe. Online access to the information found Graphics.lib is a C-language graphics library, TC2.0 connection needs this and tcc.exe not need, we can understand that tc2.0 on the basis of tcc.exe more such an extension, each library file is equivalent to a small module, support an extension. When we need it, we just need to add the corresponding library file.

Functionally, Tcc.exe can only compile files that already exist in the current directory from CMD, while TC2.0 supports file creation, modification, saving, compiling, and connection, and is a small C-language development platform integrating Tcc.exe and Tlink.exe.

So if the newly created environment reduces the required documents to 4, can it be reduced? After the experiment, I found that no matter which related file was removed, it would make an error at compile time, generate an incorrect hello.obj file, and CMD would have similar hints:

So we can conclude that in this environment, 4 of the relevant files must be in order to ensure the success of the compilation.

Supplemental research: On-line data discovery, TC2.0 is an integrated development environment that integrates the following files:

INSTALL. EXE Installer file

Sci EXE Integrated Compilation

Tcinst. EXE Integrated Development Environment Configuration Setup program

Tchelp. TCH Help Files

Thelp.com Read

Tchelp. TCH-Resident Program

README information file about Turbo C

Tcconfig. EXE configuration file Translator

Make. EXE Project Management Tool

TCC. EXE Command-line compilation

Tlink. EXE Turbo C Series connector

Tlib. EXE Turbo C-Series Library management tools

C0?. OBJ different mode startup code

C?. LIB different Mode runtime library

GRAPHICS. LIB Graphics Library

EMU. LIB 8087 Emulation Library

FP87. Lib 8087 library

*. H Turbo C Header File

*. BGI Different display graphics drivers

*. C TURBO C Routine (source file)

Which: the above?

T Tiny (Micro mode)

S Small (Small mode)

C Compact (compact mode)

M Medium (medium mode)

L Large (large mode)

H Huge (huge mode)

In the TC installation folder can be seen in fact:

But our previous research found that it was possible to compile the file successfully without the need for TC2.0, and that the Tcc.exe was not in the directory, so TC2.0 was integrating Tcc.exe into its own file rather than calling the external compiler.

To view the features of Tcc.exe:

You can see that the TCC command format is: TCC [options] [filename]

How does the lack of proper related files affect compilation? I first compile with the complete related file, the generated EXE file has 8KB, and delete cs.lib after compiling, found that the generated exe file only 536 bytes, than the correct file is much smaller, so I think in the connection with Tlink because of the lack of related files caused a large part of the relevant files are not connected.

Online access to the data found that the boot code has t Tiny (micro mode), S Small (small mode), C Compact (compact mode), M Medium (medium mode), L Large (large mode), H Huge (huge mode), Then the respective corresponding file should be: C0t.obj,ct.lib,c0s.obj,cs.lib ... Sure enough, we found the relevant documents in the Lib library of TC2.0;

I found that other models have corresponding c0*.obj and c*.lib files, while mini-mode only c0t.obj files, which is why? is the compilation feature of the pattern determined that they just need to use a file? Let's try to compile successfully, copy the C0t.obj to the C:\c folder, and then use the TCC-MT Hello.c compiled, found that can compile the connection succeeds, the generated EXE file can also run successfully, so the micro-mode compilation is not required for a specific LIB file, this shows that the micro-mode at compile time does not need to add special library functions to the file.

To implement the segment address and offset address of a child function, first know where the segment address and offset address of the child function is placed. We found in the20140426_ Comprehensive study 2 study that: (1) The name of the function represents its offset address. (2) The invocation of the function is implemented using the Call-ret method in the Assembly . In addition, we know that the register that stores the current segment address in the assembly is the CS Register.

First populate the main function, printing the offset address of each function separately, and The value of the CS register after the function is run:

The results shown are:

FoundMainThe offset address of the function is21b,F1The offset address is1FA,F2The offset address is205,f3The offset address is About,CSThe register is always1a2.soCSIs the value of the segment address of the child function really??we useDebugLoading Program:

The offset addresses of main (),F1 (),F2 (),f3 ( ) are found to be right. But the segment address should be 076a instead of the value of the printed CS Register 1a2.

If you use long shaping to print the values of the main function and the F1 :

The segment address is still 1a2. But the debug view is not the same. So where's the problem?

Viewed on the web, it was found that a similar problem was explained by the fact that "debugging is done with a debugger to handle a single-step breakpoint exception." More functions are loaded. Of course the address will be different. ”

I think it is possible to run directly and debug debug allocated memory space is not the same, as mentioned above,debug debugging to load more functions, so the main The segment address of the function is relatively large.

But here is the child function and the main function in a paragraph, so the segment address of the child function can be used in the main function _cs

That is, if the child function and the main function are not in one paragraph? We know that the file generated with the TCC hello.c can have two segments, one for the code snippet and one for the stack and data segment. So both the child function and the main function are in the same code segment. So if the code amount is more than 64kb, a segment can not be stored, how to do? Access to the information on the Internet is as follows:

There are 6 compilation modes available in the C language, 6 of which are:

Micro mode (Tiny), small mode (Small), medium mode (Medium), compact mode (compactness), large mode (Large) and Jumbo Mode (Huge). The relationship between them is as shown. Users can choose according to their own program size and needs.
│ Small Program │ Big program
━━━━┿━━━━━━┿━━━━━━━━
Small Data │ Micro, small │ in
Big Data │ Compact │ big, giant


The so-called small program refers to the program only one program segment, the size of not more than 64KB, the default code (function) pointer is near (short-range pointer). The so-called Big program refers to the program only a number of program segments, each program segment does not exceed 64KB, but the total program volume can exceed 64KB, the default code pointer is far (remote pointer). Small data means that data has only one data segment, and the default data pointer is near. Big data means that data has more than one data segment, and the default data pointer is far.

From the above, we say that only one code snippet of the program is a small program, its code is not more than 64kb, at compile time in the default compilation mode: Small mode to compile. That is to say in the previous study of the TCC A.C generated exe file has a code snippet, a stack and data section can be understood. When we use The TC2.0 the indispensable relevant files about the compilation mode is c0s.obj and cs.lib, so TC2.0 The default compilation mode is small mode. Therefore, the default compilation can only compile files of less than 64kb of code.

For different modes of the difference, the following information:

C-language compilation mode-micro mode (Tiny)
In the micro-mode program data and code are placed in the same paragraph, that is, they do not exceed 64KB. In micro mode, the segment addresses of the code snippet, stack segment, and data segment are the same, that is, cs=ds=ss=es. In micro-mode, the data pointer is near, the General applet can be compiled with this compilation mode. You can also use the Exe2bin conversion program in DOS. EXE program into a. COM program. Code snippets, data segments, and stack segments are all in the same paragraph, when addressing them, all at the same address offset reference point, the segment with this feature is also known as belonging to the same set of segments (DGROUP), the stack is growing upward, that is, each stack once, the stack pointer sp minus 2, that is, the direction of the address reduction, It starts with the initial value pointing to the bottom of the stack, which is 0xFFFF (64KB). The heap is growing downward, that is, changing the direction of the increased address. Heap and stack addresses grow with each other, and when the two do not meet, there is a free space. The general program is this state, when the stack address is large, the two may be re-merged to cover some of the heap space.

C-language compilation mode-small mode (Small)
In small mode, the code in the program is placed within the 64KB code snippet, and the data is placed in a 64KB data segment. In small mode, the stack segment, the additional data segment and the data segment all point to the same address, they are three, namely, ds=ss=es, the pointer is near, the general program is compiled in small mode. In small mode, the memory allocation is as shown. You can see that the data segment, stack segment, and additional fields are the same group, that is, their offset addresses are at the same address as the reference point.

C-language compilation mode-medium mode (Medium)
In medium mode, all data is placed within 64KB data segments, so near is used within the data segment, the code can be larger than 64KB (1MB allowed), and thus can be used in different code snippets (far remote pointers). This compilation mode is suitable for large programs with large code volumes and small data volumes. The memory allocations in mode are as shown in.

C Language Compilation mode-compact mode

In compact mode, when the amount of data exceeds 64KB, it can be placed in multiple data segments, and the pointer within the data segment is (FAR). When the code amount does not exceed 64KB, it can be within one segment, so the pointer within the code snippet is near. In this mode, however, the static data still cannot exceed 64KB, and the heap is accessed using the far pointer. The memory structure in compact mode is as shown.


C-language compilation mode-large mode (Large)

In large mode, both the code and the data are in the far pointer and can reach 1MB. Static data is still the same as in compact mode and cannot exceed 64KB. The memory structure in large mode is as shown.

C Language Compilation Mode-Mega Mode (Huge)

In the mega mode, the code snippet and data segment are all using the far pointer, the code is distributed in different code sections, the data is also distributed in different data segments, they come from different source programs, and there is only one large stack. and the static data size allows for more than 64KB. The memory structure in jumbo mode is shown.

That is, different patterns differ in the amount of code that can be compiled and the amount of data. So how do you choose the compilation mode for files over 64KB when compiling?

No matter which compilation mode is used, the code and the amount of data generated by the C source program cannot exceed 64KB, and for more than one source program, it can be decomposed into two or more programs according to how much code or data is compiled separately. Large code program to choose large Code compilation mode (medium mode, large mode and giant mode), large data volume program should choose Big Data compilation mode (compact mode, large mode and giant mode), so that the compilation of the generated. obj file will be brought to the connector information, the code and data are arranged in different segments. The resulting. exe file will tell DOS when it loads how the program should load the code snippet and data segment and initialize the register. In this way, you can determine the size of the data area in different compilation modes, that is, greater than 64KB, or not more than 64KB.

2, the problem solved

(1) TC2.0 integrates Tcc.exe, Tlink.exe, and includes more features. Tcc.exe, Tlink.exe, C0s.obj, Cs.lib, Emu.lib, and maths.lib are essential files for compiling C files.

(2) How to print the segment address and offset address of all functions?

A: You can use printf ("%LX", (long) function name), to print, so that the paragraph address and offset address is connected, you can also use printf ("%x%x", _cs, function name), to print, so that the segment address and offset address is separate.

(3) Why is the segment address of the output different from the section address of debug debugging?

Answer: Debug debugging to load more functions, so the main function of the segment address is relatively large.

(4) The file generated with TCC HELLO.C can have two segments, one for the code snippet and one for the stack and data segment. So both the child function and the main function are in the same code segment. So if the code amount is more than 64KB, a segment can not be stored, how to do?

A: That should use other memory compilation mode, TC under the default mode is small mode, only support the code and data below 64KB, if the code and data more than 64KB, you can use large mode or giant mode.

3. Issues addressed by the Seminar

(1) If the relevant file is missing, will tcc.exe call Tlink.exe? Does the generated obj file contain any other related files?

A: According to the discussion, if the relevant file is missing, Tcc.exe calls the Tlink.exe but cannot find the file and needs to use Tlink.exe to connect the obj file to generate the EXE file. If the relevant file is missing, Tlink will connect to other related files. After experiment, if Maths.lib is missing, the program can output HelloWorld, And if the missing c0s.obj or cs.lib compiled EXE file will run an error, because Maths.lib is an operation-related library, if there is no operation in the program, even if the missing does not affect the execution of the program, and C0s.obj or Cs.lib is the program to start the operation of the files required, so once the missing will Error.

Modification: After another experiment, I found that the missing Maths.lib file, the compilation connection generated EXE file will also show an error, which indicates that the previous conclusion is not established. The function of Tcc.exe is to compile the C source file into a binary obj file, and then call Tlink.exe to generate the EXE file. That is, the C source file can be compiled and concatenated into an executable EXE file as long as a command is executed. We did not change the contents of the Tcc.exe file, then the TCC will still call Tlink, just because the related file is incomplete and the error caused tlink not to properly perform the connection. The experiment found that tcc-c hello.c or tcc-linclude hello.c to compile the source file into an obj file, and then connected to EXE file or error, there is information on the Internet that such a compile connection will cause the return error, but does not specify the reason. I think it might be possible to connect hello.obj alone with Tlink, which means that Tlink does not have the ability to invoke the relevant file, only that it is called by the TCC to tell it what order it should be connected to, so it can connect properly. But this conjecture and the above problem, I can not find the method and data to verify, I hope that seniors in the derivative class to speak, the specific problem is this:

If the associated file is missing, does the generated obj file contain any other related files? Why must tcc.exe call tlink.exe to generate the correct executable file ?

(2) Why it is possible to compile the link successfully on the original platform even without TCC and Tlink.

A: TC2.0 integrates a variety of compilers,C and assembly language can be mixed compilation. If assembly language is present, then TC2.0 will call Tcc.exe to compile. the Turbo C package has two types of compilers, which are called TC in an integrated development environment . EXE and command-line mode called TCC. Exe. The integrated development environment includes : Integrated Editor, command-line compiler, connector, debugger.

(3) How is the library file searched?

Answer: in Turboc. CFG allows you to specify the location of the library files that TCC can use to search. However , Modifying the path with TC2.0 is not saved in Turboc. CFG, the other configuration file is generated.

(4) Three modes, whether they can be replaced with each other

A: In fact, C language compilation has 6 modes, the results of the 6 pattern compilation are the same, but the supported data size and program size is different.

(5) Why is the value of the printed segment address not the same as the value of the address when debugging with Debug?

A: At compile time given an offset address, loading cmd or debug again given a segment address, so the segment address is not the same.

(6) What happens if the amount of code exceeds 64K?

A: Then you cannot compile with the default in-memory compilation mode (small mode), and an error occurs. It should be compiled with a schema that supports large programs (such as medium mode, large mode, Jumbo mode). The compiler compiles the code into programs that are not more than 64KB in number.

(7) Why the offset address of the print is changed, at ordinary times the offset address of the main function is 1FA but this time the main function of the offset address is not the case?

A: The offset address of the main function is not necessarily 1FA, but the offset address of the first program is 1FA. The offset address of the function is then changed by the length of the program. In fact, the principle and assembly language in different sections of the address is the same. After the function in the c0s.obj is loaded, the contents of the source file should be loaded at 1FA, at which point the function is placed at 1FA at the first one.

(8) How to print out the segment address and offset address with a single statement.

Answer: You can use printf ("%LX", main) to print out the segment address and offset address.

4. Learning Thoughts

In fact, TC2.0 is also written by others to facilitate the use of a program developers. It calls Tcc.exe, Tlink.exe, and some library files to implement the program's compiled connections, while the more advanced development tools simply write more features, call more files, and use more advanced compilers and connectors. They are all derived from this little tcc.exe. Large, complex things, in fact, may play a central role is a little bit of things.

This study involves a lot of previous research knowledge, especially the compilation. Therefore mastered the assembly language for us to learn C language has laid a good foundation.

Research on the compiling link mechanism of TCC and Tlink

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.