Coff-common object file format is a popular object file format, to match the target file generated by the compiler (*. O /*. OBJ), because this format is not only used for the target file, but also for library files and executable files ). Do you often use VC? It generates the target file (*. OBJ) in this format. Other compilers, such as GCC (GNU Compiler Collection), ICL (Intel C/C ++ compiler), and vectorc, also use the target file in this format. Not only C/C ++, but many other languages also use this format of object files. Target files in a unified format bring great convenience for mixed language programming.
Of course, this is not the only object file format. Common formats include OMF-object module File and elf-executable and connection file format (executable and linking format ). OMF is a format developed by a large group of IT giants in N years. It is common on Windows platforms. The target file used by Borland is in this format. Ms and Intel used the same format N years ago. Now they are both switched to different sides and use coff format. The ELF format is widely used on non-Windows platforms and has never been seen on Windows platforms. As a programmer, it is necessary to meet these guys you often deal! But let me introduce coff first this time!
Coff file structure
Let's take a look at the overall structure of the coff file to see what it looks like!
File Header
The file header starts from the zero offset of the file. Its structure is very simple. The structure of C is described as follows:
Typedef struct {
Unsigned short usmagic; // magic number
Unsigned short usnumsec; // number of sections
Unsigned long ultime; // time stamp
Unsigned long ulsymboloffset; // symbol table offset
Unsigned long ulnumsymbol; // Number of symbols
Unsigned short usopthdrsz; // optional Header Length
Unsigned short usflags; // file tag
} Filehdr;
In the structure, the usmagic member is a magic number. In the coff file on the i386 platform, its value is 0x014c. If the magic number in the coff file header is not 0x014c, you don't need to read it. This is not a coff file on the i386 platform. This is actually a platform identifier.
The second member, usnumsec, is an unsigned short integer that describes the number of paragraphs. This is the number of section headers.
The ultime member is a timestamp used to describe the coff File Creation Time. When a coff file is an executable file, this timestamp is often used as an encrypted comparison identifier.
Ulsymboloffset is the offset of the symbol table in the file, which is an absolute offset and must be counted from the file header. This offset exists in other sections of the coff file, which are all absolute offsets.
The ulnumsymbol member provides the number of symbol records in the symbol table.
Usopthdrsz is the length of the optional Header, which is usually 0. The type of the optional Header is also known from this length. For different lengths, We need to select different processing methods.
Usflag is the property mark of the coff file. It identifies the type of the coff file, the data stored in the coff file, and other information.
The value is as follows:
| File Header |
| Optional Header |
| Section header 1 |
| ...... |
| Section header n |
| Section Data |
| Relocation directives |
| Line numbers |
| Symbol table |
| String table |
|
As shown in the left figure: The coff file contains eight types of data, from top to bottom: 1. File Header) 2. optional Header (optional Header) 3. Section Header) 4. Section Data) 5. Relocation direves ves) 6. line numbers) 7. symbol table) 8. string table) |
Among them, apart from the paragraph header, there can be multiple sections (because there can be multiple paragraphs), there can be only one section of all other types.File Header: As the name implies, it is the header of the coff file, which is used to save the basic information of the coff file, such as the file ID and the location of each table.Optional Header: As the name implies, it is also a header, optional, and dispensable. This header is basically not found in the target file. However, in other files (such as executable files), this segment is used to save the information not described in the file header.Paragraph header: Gu ...... (Ignore me, another guy is hitting me. j). This header (why are there so many headers ?!) Is used to describe paragraph information, each paragraph has a paragraph header to describe. The number of paragraphs is indicated in the file header.Section Data: This is usually the largest data segment in the coff file. The real data of each section is stored in this position. As for how to tell which paragraph the data is, don't ask me, ask the section header.Relocation table: This table usually exists only in the target file and is used to describe the relocation information of symbols in the coff file. As for why you want to relocate, go home and check out your operating system books.Symbol table: This table is used to save information about all the symbols used in the coff file. When multiple COFF Files are connected, this table helps us to relocate the symbols. It is also used when debugging a program.String table: it is used to save strings. But who can save the string? I don't know !? Ask me! The J symbol table describes the symbol information in the form of a record, but it only holds 8 characters of space for the symbol name, and early mini programs will be able to do it, in the current program, a single symbol name can be dozens of characters at will, how can it be 8 characters? No, so you have to store these names in the string table. The symbol table only records the positions of these strings.This is basically the structure of the file. It looks ugly, but its designers are somewhat far-sighted. The scalability is well designed so far as to be used. After learning about the overall structure of the file, let's analyze it one by one.
| Value |
Name |
Description |
| Zero X 0001 |
F_relflg |
No relocation information tag. This flag indicates that the coff file has no relocation information. This tag is usually 0 in the target file and 1 in the executable file. |
| Zero X 0002 |
F_exec |
Executable tag. This mark indicates that all symbols in the coff file have been parsed, And the coff file should be considered executable. |
| Zero X 0004 |
F_lnno |
All row numbers in the file have been removed. |
| Zero X 0008 |
F_lsyms |
The symbol information in the file has been removed. |
| Zero X 0100 |
F_ar32wr |
Indicates that the file is a 32-bit little-Endian coff file. |
Note: Little-Endian cannot remember its Chinese name. It is an exponential data arrangement. For example, The hexadecimal 0x1234 takes the order of 0x34 0x12 in the memory in the little-Endian mode. The opposite is big-Endian. In this mode, the order in the memory is 0x12 0x34.
The content of this table is not comprehensive, but it is only commonly used in the target file. Other tags will be provided later when we introduce the PE format.
Optional Header
The optional Header is followed by the file header, that is, starting from the 0x0014 offset of the coff file. The length can be 0. Optional headers of different lengths have different structures. The standard optional Header length is 24 or 28 bytes, usually 28. Here I will only introduce the optional headers with a length of 28. (Because the length of this header is customized, the results are different for different people. I can only choose one of the most commonly used headers for introduction, and I don't know anything else)
The structure of this header is as follows:
Typedef struct {
Unsigned short usmagic; // magic number
Unsigned short usversion; // version ID
Unsigned long ultextsize; // size of the Text Segment
Unsigned long ulinitdatasz; // the size of the initialized Data Segment
Unsigned long uluninitdatasz; // uninitialized data segment size
Unsigned long ulentry; // entry point
Unsigned long ultextbase; // base address of the body segment
Unsigned long uldatabase; // Data Segment Base Address (available only in pe32)
} Opthdr;
The first member usmagic is still a magic number, but this time its value should be 0x010b or 0x0107. When the value is 0x010b, the coff file is a common executable file; when the value is 0x0107, the coff is a ROM image file.
Usversion is the version of the coff file, and ultextsize is the length of the executable coff body segment. ulinitdatasz and uluninitdatasz are the length of the initialized Data Segment and the uninitialized data segment respectively.
Ulentry is the entry point of the program, that is, the position of the body segment (EIP register value) When coff is loaded into the memory. When the coff file is a dynamic library, the entry point is the entry function of the dynamic library.
Ultextbase is the base address of the body segment.
Uldatabase is the base address of the Data Segment.
In fact, you only need to pay attention to usmagic and ulentry among these members.
Paragraph Header
The paragraph header is followed by the optional Header (if the length of the optional Header is 0, it is followed by the file header ). It is 36 bytes in length, as shown below:
Typedef struct {
Char cname [8]; // segment name
Unsigned long ulvsize; // virtual size
Unsigned long ulvaddr; // virtual address
Unsigned long ulsize; // segment length
Unsigned long ulsecoffset; // segment data offset
Unsigned long ulreloffset; // segment relocation table offset
Unsigned long ullnoffset; // line number Table offset
Unsigned short ulnumrel; // relocation table length
Unsigned short ulnumln; // The table length of the row number.
Unsigned long ulflags; // segment ID
} Sechdr;
This header is an important header. It describes the final information we need. A coff file can be separated from other sections, but the two sections are indispensable.
Cname is used to save the segment name. Common segment names include. Text,. Data,. Comment,. BSS, etc .. Text is the body section, which is usually the code segment ;. data is a data segment. The data stored in this data segment is initialized data ;. the BSS segment can also be used to save data, but the data here is not initialized, and this segment is also an empty segment ;. comment Segment. You can also see the name. It is a Comment Segment used to save some compilation information. It is a comment on the coff file.
Ulvsize is the size of the Data Segment loaded into the memory. Valid only in the executable file. The total number in the target file is 0. If the length is greater than the actual length of the segment, the extra part is filled with 0.
Ulvaddr is the virtual address used to load or connect data segments. For an executable file, this address is relative to its address space. When an executable file is loaded into the memory, this address is the first byte of data in the segment. For the target file, this is only an offset of the current position of the segment data during the relocation. For ease of computation, the positioning computation is simplified. It is usually set to 0.
This is the actual length of the Data Segment, that is, the length of the data segment. It determines the number of bytes to read when reading the data segment.
Ulsecoffset is the offset of the segment data in the coff file.
Ulreloffset is the offset of the relocation information of the segment. It points to a record of the relocation table.
Ullnoffset is the offset of the row number table of the segment. It points to a record in the row number table.
Ulnumrel is the number of records of relocation information. From the record pointed to by ulreloffset to the ulnumrel record, it is the relocation information of this section.
Ulnumln is similar to ulnumrel. However, it is the number of records of the row number information.
Ulflags is the attribute identifier of the segment. The values are as follows:
| Value |
Name |
Description |
| Zero X 0020 |
Styp_text |
Body segment id, indicating that this segment is code. |
| Zero X 0040 |
Styp_data |
Data Segment identifier. Some identifiers are used to save initialized data. |
| Zero X 0080 |
Styp_bss |
This identification segment is also used to save data, but the data here is not initialized data. |
Note: In the BSS segment, the values of ulvsize, ulvaddr, ulsize, ulsecoffset, ulreloffset, ullnoffset, ulnumrel, and ulnumln are all 0. (The above table only contains partial values. Other values are described in PE format, and later are the same)Segment data
"Person" is like its name. Here is the location where the data of each segment is saved. The data content and structure of different types of segments are also different. However, in the target file, the data is raw data ). No special format exists.
Relocation table
This table stores the relocation information of each segment. This is a large table, because the relocation information of all segments is in this table. Each section header records the offset and quantity of its own relocation information. This table is used to read the relocation information. Of course, you can also regard the entire relocation table as multiple relocation tables, and each section has its own relocation table. This table is only available in the target file and does not exist in the executable file.
Since there are tables, there will be records. Each record in the relocation table is a relocation information. The record structure is very simple, as follows:
Typedef struct {
Unsigned long uladdr; // locate the offset
Unsigned long ulsymbol; // symbol
Unsigned short ustype; // positioning type
} Reloc;
It's easy enough. There are only three members in total! Uladdr is the offset of the content to be located in the segment. For example, if the starting position of a text segment is 0x010 and the uladdr value is 0x05, your location information should be written at 0x15. The length of the information depends on the type of your code. The 32-bit code requires four bytes, and the 16-bit code requires only two characters.
Ulsymbol is a symbolic index that points to a record in the symbolic table. Note: Here is the index, not the offset! It is only the record number of a record in the symbol table. This Member specifies the symbols mapped to the relocation information.
Ustype is the identifier of the relocation type. In 32-bit code, there are usually only two positioning methods. First, absolute positioning, and second, relative positioning. The Code is as follows:
| Value |
Name |
Description |
| 6 |
Reloc_addr32 |
32-bit absolute positioning. |
| 20 |
Reloc_rel32 |
32-bit relative positioning. |
These values vary for different processors. The two most common positioning methods on the i386 platform are provided here.
The positioning method is as follows:
Absolute Positioning
In the absolute positioning mode, you need to give the absolute address of the symbol (note: sometimes it may not be an address, but a value. For a constant, you do not need to give its location value, ). Of course, this address is not ready-made. You must use the relative address of the symbol + the relative address of its segment to obtain its absolute address.
Formula: absolute symbol address = segment Offset + symbol offset
You can obtain the offset from the paragraph header and symbol table. When you want to relocate a paragraph, you must first relocate the paragraph to locate its symbols.
Relative positioning
Relatively positioning is more complicated. The address information it requires is the offset relative to the current location. The current location is the last four bytes of the absolute address pointed to by uladdr (the 32-bit code is four bytes, 16 bits are two bytes. That is, use the positioning Offset + current segment Offset + machine font limit 8
Formula: Current address = positioning Offset + current segment Offset + machine font limit 8
With the current address, you can calculate the relative address. You only need to subtract the current address from the absolute address of the symbol.
Formula: relative address = absolute symbolic address-Current address
After calculating the address and writing it to the location pointed by uladdr, everything is OK! You have completed the relocation.
Row number table
The row number table is useful for debugging. It establishes a ing relationship between executable binary code and the source code line number. In this way, when the program execution is incorrect (in fact, correct J can also be used), we can know the line number of the error source code based on the current position of the code to be executed, and then modify it. Without it, the ghost knows which line is faulty!
The format is also very simple. There are only two members:
Typedef struct {
Unsigned long uladdrorsymbol; // Code address or symbol Index
Unsigned short uslineno; // row number
} Lineno;
Let's take a look at the second Member, uslineno. This is a counter that starts counting from 1. It represents the line number of the source code. The first member uladdrorsymbol represents the Source Code address when the row number is greater than 0. When the row number is 0, it becomes the index of the symbol mapped by the row number in the symbol table. Let's take a look at the symbol table!
Symbol table
A symbol table is a table used to save symbol information in an object file. It is also the most complex table in a coff file. The symbols used by all paragraphs are in this table. It is also composed of many records, each of which is saved in the following structure:
Typedef struct {
Union {
Char cname [8]; // symbol name
Struct {
Unsigned long ulzero; // string table ID
Unsigned long uloffset; // string offset
} E;
} E;
Unsigned long ulvalue; // symbol Value
Short isection; // The CIDR Block of the symbol.
Unsigned short ustype; // symbol type
Unsigned char usclass; // symbol storage type
Unsigned char usnumaux; // number of additional symbols
} Syment;
The cname symbol name, which is the same as all the previous names, is also 8 bytes, but the difference is that it is in a consortium. Members of the Same bucket are ulzero and uloffset. If the symbol name contains only 8 characters, it is good. You can directly put it in this cname. However, if the name length is greater than 8 bytes, it cannot be placed here, so you have to put it in the string table. At this time, the value of ulzero will be 0, and the offset of the name of the symbol we use in the uloffset will be given.
If a symbol has a name that is not enough, it also has a value! Ulvalue is the value represented by this symbol.
The isection Member specifies the section where the symbol is located. If its value is 0, this symbol is an external symbol and needs to be parsed from other COFF Files (to connect multiple target files, this symbol is to be parsed ). When its value is-1, it indicates that the value of this symbol is a constant, not its offset in the paragraph. When its value is-2, this symbol is only a debugging symbol