Analysis of DOS executable program structure [reprint]

Source: Internet
Author: User
Wang defang Zhu Feng Wang defang Abstract: This article Program And. when the EXE program is loaded, different memory images are obtained :. the com program has only one physical segment. The maximum length of the segment is 64kb :. the com program can only start at 100h of the Offset address. DOS pair. the length of the EXE file is not limited, so it is easy to organize large applications; and. in the EXE file, end start labels are used to describe the start point, push ds is used to save the segment address of the program segment prefix, and sub ax is used, ax And push ax commands to save the first address of the INT 20 H command in PSP. It can also be seen that because the. com program is smaller and simpler than the. exe program, the loading speed of the. com program is faster than that of the. exe program. It provides reference for further exploiting the advantages of high execution speed of assembly language programs. In the MS-DOS, there are two types of executable programs, namely. com and. EXE. 8086/8088 the assembly language also has two types of program structure files, which can be compiled separately to form the. com program and. exe program. However, assembly language is the fastest and most effective language that computers can provide to users. Therefore, assembly language is essential for scenarios with high space and time requirements for programs, so fundamentally clarify. EXE file and. the program structure characteristics of COM files, as well as the differences between them, the different loading situations of the two programs and the prefix structure characteristics of the program segments are necessary to truly understand assembly language programming. Basic Components
For an assembly language applet, data segments, additional segments, and stack segments can not be defined, but cannot be defined. Code Without a code segment, it is not a program. First, let's take a look at the basic composition of the code segment.
1.1 8086/8088 assembly language program. EXE file ("Standard Order") structure form 1

Figure 1. Structure of the EXE file 1.1.1 in order to assemble the file correctly, the assembler must be clear about the structure of the middle section of the program and what segment should be accessed during the execution of various commands? That is, the relationship between segments and registers in the program, which can be achieved by assume pseudo operations.
Format: Assume segment register 1: segment name 1, segment register 2: segment name 2 ,...
For example, assume Cs: code, DS: data, SS: Stack
This actually means that the segment address value of the code segment is placed in the CS segment register, and the segment address value of the data segment is placed in the DS segment register, stack segment stack CIDR Block value is placed in the SS segment register.
1.1.2 assume only specifies the relationship between the segment register and the segment and does not save the segment address. Therefore, if there are data segments, additional segments, and stack segments, the segment address must be saved. You do not need to save the segment address because this operation is completed during program initialization.
Format: mov ax, segment name
MoV segment register, ax
Movax, Data
For example: movds, ax
1.1.3 an assembly language program is a sub-Program of DOS. Therefore, the main part of the Assembly Language program must be defined as a long process so that DOS can call this process, at the end of the program, use the RET command to return dos.
Format: process name proc far
.
.
.
RET
Process name endp
1.1.4 in order to use the RET command to correctly return DoS at the end of the program, at the beginning of the code segment, the DOS field must be saved to save the return address. The command to save the return address is:
PUSH DS
Subax, ax
Pushax
1.2 8086/8088 structure of the. com file of the Assembly Language Program
2

When a. com file is used, the program is not segmented. The entry point must be 0100 h, and the stack segment does not need to be set. All processes should be defined as the near attribute. Figure 2. com file structure 2 program loading
Program loading is to load the. exe file or. com file into the memory and execute it. Loading one program is completed by the DOS 4bh system function. When a program is loaded into the memory, DOS also installs a 256 (100 h)-byte program segment prefix (program segment prefix) in front of it as the software interface between DOS and running programs.
2.1 program segment prefix (PSP)
In short, a program segment prefix (PSP) is a data structure, which is located in the same memory allocation block as the user program itself and forms an integral whole. It is the software interface of DOS (as the parent program of the loader) and the loaded program. It is mainly used to store some control information related to the user program, and provides a way to return dos when the program ends normally or abnormally. It has a total length of 256 bytes and is located at the starting position of the program segment. The DOS structure of PSP 3 is shown below:

Figure 3 memory image of program segment prefix structure 2.2. com file
The content stored by A. com program on the disk is exactly the same as that imported into the memory. During execution, the. com program is simply copied to the top of the prefix of the program segment. The starting address is 100 h. The image 4 after the. com file is loaded into the memory is shown below:

Figure 4. memory image of the COM file as shown in the memory image:
(1) When one. when the com program starts to run, all the segment registers (CS, DS, SS, and ES) contain the same value and are the segment address of the segment where the program segment prefix (PSP) is located. That is, the four segments are actually the same physical segment. The maximum length of the segment is 64 KB, which is also the maximum length of the. com file.
(2) The. com program is executed from a 100h offset. DOS forcibly sets IP address = 0100 H, that is, the initial Cs: IP address must point to the second 1st byte location after PSP. Therefore, when compiling the source program of the. com file, after using the org 1st H pseudo command to reserve the PSP space, the subsequent commands should be one executable command.
(3) The entire. com program must be within one physical segment (64 KB) and one word at the end of the segment should be used as a stack. Before handing over control to A. com program, the MS-DOS always has to press 1 word 0 into the stack. Minus the 256 bytes space occupied by the prefix of the word and program segment, the remaining space is the space available for the. com program. That is to say, all the code and data of the. com program can use up to 65,536-2-256 = 65,278 bytes of storage space.
(4) do not set the stack segment in the. com file source program. Stack pointer register SP points to this. the highest-end word of memory that can be used by com programs. That is, if there is at least 64 KB of storage space available, the SP register value is offfeh. DOS forcibly sets Ss = PSP segment address, SP = offfeh. To sum up, the structure of the. com program file must be 2.
Memory image of 2.3. EXE file
2.3.1. the structure of the EXE file is 1. the source program of the EXE file is compiled by the assembler to generate the target file, which is generated after the target file is combined, classified, and located by the connected program. the EXE file is an executable floating code file. This file is composed of two parts in the disk space: one part is the header information block, and the other part is the floating loading module. The header information is divided into a formatting area (30 bytes) at the front and a relocation item table at the back. It is mainly used to explain how to load code and data segments. Although a file has been read into the memory high-end buffer during file loading, It is discarded after analysis and use. The only thing that really stays in the memory space is the loading module .. The disk space structure of the EXE file is shown in Figure 5:

Figure 5. disk Space Structure of the EXE file 2.3.2. the memory image of the EXE file. when the EXE file is loaded into the memory, only the module itself is loaded. Dos also installs a 256-byte program segment prefix for the user program .. EXE program execution is also loaded in the program segment prefix of the tight top, but the code segment, Data Segment and stack segment specific installation to the memory where, to be provided by the MS-DOS 4bh system function call according. the information in the EXE file header. The memory image 6 of the EXE file is shown below:

Figure 6. memory image of the EXE file as shown in the memory image figure:
(1 ). the EXE file contains four segments: code segment, data segment, additional segment, and stack segment. This segment structure best reflects the Intel 80 x86 system memory address segment structure and facilitates the organization of large-scale applications.
(2) DOS has no constraints on the length of. EXE files and can occupy all user space.
(3 ). the source program of the EXE file should define a stack segment. This segment may have a stack combination type, or use assume statements to associate the segment name with the SS segment register, or use both of them. When a user program is loaded into the memory, the loader automatically points SS: SP to the bottom of the stack area, and the source program does not need to set SS and sp.
(4 ). the initial CS and IP values should not be set in the source program of the EXE file. Instead, use a pseudo command such as the end startup label to describe the startup point. During loading, CS: the IP Register automatically points to 1st executable commands in the program.
(5 ). after the EXE program is loaded into the memory, the values in the DS and ES registers are the segment addresses of the program segment prefix PSP. At the same time, we can see from Figure 3 (program segment prefix structure) that, the content of the first two bytes in the program segment prefix PSP corresponds to the hexadecimal "CD20" and the 8086/8088 command "INT 20 h". This command is used to end the command when the program does not reside and exit, however, when sending this Soft Interrupt command, the CS segment register must be the segment address prefixed with PSP. Therefore. in the EXE ("Standard Order") structure, DS is used to save the segment address of the program segment prefix in the stack, and sub ax is used, ax And push ax commands are used to save the first address of the INT 20 H command in the prefix of the program segment.
Push ds --- → save PSP segment address
Sub ax, ax
The offset address of the 1st bytes of the push ax --- → INT 20 H command is 0.
Second, the main process is defined as a remote process, so RET is a remote (Inter-segment) return command. Therefore, when the program finally executes the RET command, it will make the cs = PSP segment address, IP = 0000 h. In this way, the system will automatically transfer to PSP + 00 h, so as to correctly execute the "INT 20 h" command, complete the end of the program to exit.
Standard Order also has another advantage: the main body of the program is presented as a structure of a long process, which facilitates program modularization.
To sum up, the structure of the. exe program file must be 1. 3. com program and. exe program comparison
Through. com program and. the following figure shows the structure of the EXE program :. the com program is composed of the binary code of the program. the formatting area and relocation item table of the EXE program eliminate the machine code at some locations in the final modification module, so it occupies the storage space ratio. the exe program should be small ;. com programs do not allow segmentation. The storage space occupied by com programs cannot exceed 64 KB. Therefore, it can only be used to compile small programs. because of its small size and simplicity. com program loading speed ratio. the installation speed of the EXE program is much faster. DOS pair. the length of the EXE program is unlimited and can occupy all the user program space ;. the exe program contains four segments: code segment, data segment, additional segment, and stack segment, which facilitates the organization of large-scale applications ;. the main body of the EXE program is a structure of a remote process, which facilitates program modularization.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.