Use a simple C program to analyze the EXE files under DOS

Source: Internet
Author: User
Tags ultraedit

Http://blog.chinaunix.net/u3/104230/showart_2082499.html

Use a simple C program to analyze the EXE files under DOS

The format of the EXE file under DOS is relatively simple, so let's put the complicated EXE file under Windows aside first, pick a soft persimmon (if not specified, the following EXE files refer to the EXE file format under DOS ).

In fact, there are a lot of instructions on the EXE format on the Internet, mostly listing a large number of format instructions, it seems that people are dizzy. When you understand it, some of the general explanations are not accurate, which leads to a waste of time for misunderstanding. Therefore, we need to practice it on our own, so it is much easier to understand it. As for why C is used in this analysis, let's take a look at the relationship between C language and Assembly, especially the calling of subprograms. What standards are related to this. Okay, let's talk nonsense. Let's get started!

1. Software preparation

① Since it is related to DOS, there must be a DOS system. Virtual machines are widely used. If the installation is simple and the system crashes, it is not associated with the hardware of your computer, which is secure and convenient. For how to install it, refer to my article:

Install one of MS-DOS 6.22 on Vmware: installation of basic systems

Install MS-DOS 6.22 on Vmware II: Drive and its installation

② In addition, you must install Turbo C. I have probably used this stuff for beginner C. Install it on DOS. I am using Turbo C ++ 3.

③ You have to have software that can view the file in hexadecimal format. For example, ultraedit.

2. Generate the EXE file

We should start with the simplest analysis, so the C program should be as simple as possible and only contain one sub-program call. As follows:

int sum(int x,int y)
{
    return x+y;
}

main()
{
    int x,y,s;
    x=1;
    y=2;
    s=sum(x,y);
}

It's easy enough. Compile the link to generate the EXE file.

3. Let me analyze and analyze you.

① Open the generated EXE file with ultraedit (only the first part is truncated)

The following describes several important offsets.

② Was seen by two letters in the red circle. This is the ID of the EXE file, which occupies the first two-character section of the file. Maybe you wonder why MZ is used? Haha, check it online.

③ Next, 000C is stored at the offset of-H (do not tell me to store 0C00 ), 0009 is stored in the location at the offset of-04-4 h (please do not correct "error" for me "). The two data pieces can be used to calculate the file size. Here, 0009 indicates that the file uses 9 blocks (1 block is 512B), and 000C indicates the last block (9th blocks) c bytes are used up. See. (9-1) × 512B = 4096, plus 12 (000Ch) equals to limit 8B. As shown in DOS.

④ The offset 06-07h is the number of redirection projects. What is redirection? Simply put, the EXE file must be loaded into the memory for execution, but the data offset address in the file is different from that in the memory, redirection aims to re-Modify the offset. We can see that the number of redirection items in this file is 1. Here we should also take a look at the data at the offset 18-19h, which indicates the offset of the first redirection project in this file, which is 003Eh in this file. That is, the content of the first redirection project is stored at the offset of 003Eh in this file. Its struct declaration is:

struct EXE_RELOC
{
  unsigned short offset;
  unsigned short segment;
};

⑤ Offset 08-09h: the data points out the size of the EXE header, which is followed by program data. This file is 0020 h. Note that its unit is section. A section contains 16 bytes, that is, the program data starts at the file offset of h.

⑥ Offset 0A-0Bh: the data points out the minimum memory required to run the program. If it is smaller than the memory, the program will not be loaded and executed.

7. offset 0C-0Dh: the data points out the maximum memory required to run the program, which is generally FFFFh.

Segment offset 0E-0Fh: the offset of the stack segment in the loading module, which is: 00E5h in this file
Offset 10-11h: SP initial value, which is 0080 h in this file
That is, SS: SP = 00E5: 0080

Starting value of the IP address at 14-15 h offset, Which is 0 in this file
16-17 h offset: CS offset in the loading module, which is 0 in this file

Let's see how SS, SP, CS, and IP are allocated to the memory.

DEBUG the EXE file in DOS. That is, enter debug casm. EXE (CASM is my EXE file name) and use R to check the register. For example

Analysis: DS and ES all point to the PSP segment address, that is, the PSP segment address is 27F2h. How is CS = 2802 calculated? We know that the PSP length is 100 h. If it is converted into a segment address, it is 10 h, so CS = PSP segment address + 10 h + CS offset in the loading module = 27F2h + 10 h + 0 = 2802 h. The IP address is the same as the initial value in the file. Let's take a look at the stack segment. SS = PSP segment address + 10 h + stack segment offset in the loading module = 27F2h + 10 h + 00E5h = 28E7, in addition, the SP value is the same as the initial value.

4. Compilation and C

We have analyzed the format of the EXE header file. Next we will take a simple look at how the C language is compiled to form the machine code.

We can see that the generated EXE file is quite large, with a full size of 4 kb. In fact, we only wrote a few lines of code. Why do we generate so much code data? This is because the compiler adds the default Library when linking, which is the C RunTime Library. We can not load this library when linking, as shown below:

Now, you can use UltraEdit to check whether the weight loss is significant. The memory loading module is shown in figure

Use Debug disassembly in DOS, as shown in figure

From 2802: 000D to 2802: 002E is the assembly code of the main function. We can see that all the declared variables are placed in the stack, as shown in the figure below (the stack growth in the following figure is incorrect and should be reversed, in addition, the address is increasing upwards, but the stack is increasing downwards. Sorry, I apologize to you ):

 

Before calling the subroutine, we can see that there are parameters pushed into the stack, which is to prepare for parameter passing.

Next is CALL 0000, which is an incoming CALL command. Therefore, only the IP address is put into the stack, and the IP address is the address of the next instruction of the CALL command, that is, the 0026 incoming stack.

In this case, the IP address is changed to 0 and transferred to the subroutine for code execution. In the subroutine, the first sentence is push BP, the second sentence is mov bp and sp. After these two instructions

Then the passed parameters are retrieved from the stack,

MOV AX,[BP+04]
ADD AX,[BP+06]

These two statements take away the parameters passed in the main function, implement the sum, and put the sum in the ax register. Then execute

POP BP

Restores the value of the BP register. The RET command will pop up the IP value stored in the stack and assign it to the IP register. Then, the program returns the main program to execute the command.

POP CX
POP CX

What are the meanings of these two commands? To facilitate analysis, let's draw a picture of the stack and pay attention to the point of SP at this time.

(Note why the SP data is also drawn here, because the data is still stored in the actual memory)

The purpose of these two statements is to release the memory used for passing parameters.

MOV [BP-06],AX

The function is to assign the sum value to the variable S.

We have analyzed the procedure of calling subprograms since then. From our analysis, we can see that the stack plays an important role in opening up space for variables, passing parameters, saving register values, and so on. In addition, there are many criteria for passing parameters in C. What we just analyzed should be the cdecl standard. For more information, please search for the network and you will have your answer immediately.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.