Preparation of assembly language-for first-time contact with assembler (4)

Source: Internet
Author: User
Tags stack pop

Assembly Analysis of advanced language programs

In advanced languages, such as C and Pascal, we do not directly operate on hardware resources, but solve problems. This is mainly reflected in data abstraction and program structuring. For example, we use a variable name to access data, instead of worrying about where the data is stored in the memory. In this way, the use of hardware resources is completely handled by the compiler. However, some basic rules still exist, and most compilers follow some standards, which makes it better to read the disassembly code. Here we will talk about some of the areas in assembly code that correspond to advanced languages.

1. Common variables. The declared variables are usually stored in the memory. The compiler associates the variable name with a memory address (note that the so-called "fixed address" is a temporary address calculated during the compilation phase for the compiler. When connecting to an executable file and loading it To the memory for execution, a real-time memory address is generated only after a series of adjustments such as relocation. However, this does not affect the program logic, so don't worry too much about these details, as long as you know that all function names and variable names correspond to a memory address ), therefore, the variable name is represented as a valid address in assembly code, that is, the operands placed in square brackets. For example, declare in file C:

Int my_age;

This integer variable has a specific memory location. The statement my_age = 32; may be displayed in the disassembly Code as follows:

MoV word PTR [007e85da], 20

Therefore, the valid address in square brackets corresponds to the variable name. Another example:

Char my_name [11] = "lianzi2000 ";

This statement also determines an address, corresponding to my_name. if the address is 007e85dc, [007e85dc] = 'l', [007e85dd] = 'I', etc. the access to my_name is the access to the data at this address.

The pointer variable also corresponds to an address because it is also a variable. For example:

Char * your_name;

At this time, the variable "your_name" corresponds to a memory address, which is assumed to be 007e85f0. The statement your_name = my_name; may be shown:

MoV [007e85f0], 007e85dc; the content of your_name is the address of my_name.

2. register variables

Description of register variables is allowed in C and C ++. Register int I; indicates that I is an integer variable in the register. Generally, compilers place register variables in ESI and EDI. The register is in the internal structure of the CPU, and its access is much faster than the memory. Therefore, placing frequently used variables in the Register can increase the program execution speed.

3. Array

No matter how many dimension arrays, all elements are stored continuously in the memory, so the memory is always one-dimensional. For example, int I _array [2] [3]; an address is identified in the memory. The 12 bytes starting from the address are used to store the elements of the array. Therefore, the variable name I _array corresponds to the starting address of the array, that is, pointing to the first element of the array. The storage sequence is generally I _array [0] [0], [0] [1], [0] [2], [1] [0], [1] [1], [1] [2], that is, the rightmost subscript changes the fastest. When an element needs to be accessed, the program converts the multi-dimensional Index value into a one-dimensional Index, for example, accessing I _array [1] [1], the one-dimensional Index value converted into memory is 1*3 + 1 = 4. this type of conversion can be determined during compilation or at runtime. In any case, if we load the address corresponding to I _array into a general register as the base address, access to the array element is a problem of calculating the valid address:

; I _array [1] [1] = 0x16

Lea EBX, XXXXXXXX; the address corresponding to I _array is loaded into EBX
MoV edX, 04; Access I _array [1] [1]. It is determined during compilation.
MoV word PTR [EBX + EDX * 2], 16;

Of course, depending on different compilers and program contexts, the specific implementation may be different, but this basic form is definite. Here we can also see the effect of the proportional factor (Do you still remember that the value of the proportional factor is 1, 2, 4, or 8 ?), In the current system, simple variables always occupy the length of 1, 2, 4, or 8 bytes. Therefore, the existence of a proportional factor provides great convenience for table search operations in the memory.

4. Structure and Object

The structure and object members are stored continuously in the memory, but sometimes some slight adjustments may be made to align the word boundary or double-word boundary, to determine the object size, use the sizeof operator instead of adding the member size. When we declare a structure variable or Initialize an object, the structure variable and the object name also correspond to a memory address. Example:

Struct tag_info_struct
{
Int age;
Int sex;
Float height;
Float weight;
} Marry;

The variable marry corresponds to a memory address. At the beginning of this address, there are enough bytes (sizeof (marry) to accommodate all the members. Each member corresponds to an offset relative to this address. If all the Members in this structure are stored consecutively, the relative address of age is 0, sex is 2, height is 4, and weight is 8.

; Marry. Sex = 0;

Lea EBX, XXXXXXXX; Memory Address of marry
MoV word PTR [EBX + 2], 0
......

Objects are basically the same. Note that the implementation of a member function is in the code segment. The object contains a pointer to the function.

5. function call

When a function is defined, it also determines that a memory address corresponds to the function name. For example:

Long comb (int m, int N)
{
Long temp;
.....

Return temp;
}

In this way, the function comb corresponds to a memory address. Its call performance is as follows:

Call XXXXXXXX; address corresponding to comb. This function requires two integer parameters, which are passed through the stack:

; Lresult = comb (2, 3 );

Push 3
Push 2
Call XXXXXXXX
MoV dword ptr [yyyyyyyy], eax; yyyyyyyy is the address of the long integer variable lresult

Pay attention to two points here. First, in the C language, the parameter pressure stack order is the opposite of the Parameter order, that is, the following parameters first press the stack, so first execute Push 3. second, in the 32-bit system we discuss, if the parameter type is not specified, the default value is to press the 32-bit double character. Therefore, the two push commands are pushed into a total of two dual characters, that is, 8 bytes of data. Then execute the call command. The Call Command pushes the 32-bit address of the return address (mov dword ptr...) into the next command, and then jumps to XXXXXXXX to execute the command.

At the entrance of the comb subroutine (XXXXXXXX), the stack status is as follows:

03000000 (recall the small endian format)
02000000
Yyyyyyyy <-- ESP points to the return address

As mentioned above, the standard starting code of a subroutine is as follows:

Push EBP; Save the original EBP
MoV EBP, esp; establish framework pointer
Sub ESP, xxx; reserved space for temporary variables
.....

After executing push EBP, the stack is as follows:

03000000
02000000
Yyyyyyyy
Old EBP <---- ESP points to the original EBP

After mov EBP is executed and ESP is executed, both EBP and ESP point to the original EBP. Then sub ESP and XXX leave space for the temporary variable. Here, there is only one temporary variable temp, which is a long integer and requires 4 bytes, so xxx = 4. In this way, the framework of this subroutine is established:

03000000
02000000
Yyyyyyyy
Old EBP <---- the current EBP points to this
Temp
Therefore, subprograms can use [EBP + 8] to obtain the first parameter (M), [EBP + C] to obtain the second parameter (n), and so on. The temporary variables are all under EBP, as the temp here corresponds to [ebp-4].

After the subroutine is executed to the end, the temp value is returned:

MoV eax, [ebp-04]
Then execute the opposite operation to cancel the framework:

MoV ESP, EBP; at this time, both ESP and EBP point to old EBP, and the temporary variable has been revoked
Pop EBP; undo the frame pointer and restore the original EBP.

This is the ESP point to the return address. The following retn command returns the main program:

Retn 4

This command loads the EIP from the stack pop-up return address, and then returns to the main program to execute the command after the call. At the same time, adjust ESP (esp = ESP + 4*2) to cancel the parameter and restore the stack to the status before calling the subroutine. This is the balance of the stack. The stack balance should always be maintained before and after the subroutine is called. The temporary variable temp disappears with the return of the subroutine, so it is invalid to try to return a pointer to the temporary variable.

To better support advanced languages, Intel also provides instructions such as enter and leave to automatically establish and revoke the framework. Enter accepts two operands. The first one indicates the number of bytes reserved for the temporary variable, and the second is the number of nested calling layers of subprograms, which are generally 0. Enter XXX, 0 is equivalent:

Push EBP
MoV EBP, ESP
Sub ESP, xxx

Leave is equivalent:

MoV ESP, EBP
Pop EBP

========================================================== ==================================

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.