The operand of the assembly instruction can be the data in the memory. Addressing the memory is how to make the program obtain the required data correctly from the memory.
Intel's CPU can work in two addressing modes: real mode and protection mode. The former is out of date. Windows is now a 32-bit protection system, and PE files basically run in a 32-bit linear address space, so here we will only introduce the addressing method of 32-bit linear space.
In fact, the concept of linear address is very intuitive. Imagine that a series of bytes are in a long queue. The first byte number is 0, and the second digit is 1 ,.... Until 4294967295 (hexadecimal ffffff, which is the maximum value expressed by the 32-bit binary number ). This already has 4 GB capacity! Enough to accommodate all the code and data of a program. Of course, this does not mean that your machine has so much memory. The management and allocation of physical memory is very complicated, so beginners don't have to worry about it. In short, from the perspective of the program itself, it seems to be in such a large memory.
In intel systems, the memory address is always given by the "segment selector: valid address. The segment selector is stored in a segment register. Valid addresses can be given in different ways. The start address, length (also known as segment restriction), granularity, access permission, and access nature of a segment are determined by the segment descriptor. You don't need to go into this. You only need to know the segment selection character to determine the nature of the segment. Once the segment is determined by the selector, the base address of the valid address is counted relative to the segment. For example, if the base address of the Data Segment selected by the selector 1a7 is 400000 and 1a7 is loaded into Ds, the data segment is used. DS: 0 points to a linear address of 400000. DS: 1f5278 points to the linear address 5e5278. In general, we do not need to see the start address of the segment, but only need to care about the valid address in the segment. In a 32-bit system, the valid address is also represented by a 32-bit number. That is to say, as long as there is a segment, it is sufficient to cover 4 GB linear address space. Why should we have different segment delimiters? As mentioned above, this is to access data of different nature. Illegal access will lead to abnormal interruptions, which is the core content of the protection mode and the basis for constructing priority and multi-task systems. There are many deep things involved here, So beginners don't have to worry about it.
The Calculation Method of valid addresses is base address + inter-address * proportional factor + offset. These measurements are relative to the start address of a segment, and they are irrelevant to the start address of a segment. For example, if the base address is 100000, The interaddress is 400, the proportional factor is 4, and the offset is 20000, the valid address is:
100000 + 400*4 + 20000 = 100000 + 1000 + 20000 = 121000. The linear address is 400000 + 121000 = 521000. (Note: all are hexadecimal numbers ).
The base address can be placed in any 32-bit general-purpose register, or in any general-purpose register except ESP. The proportional factor can be 1, 2, 4, or 8. Offset is the number of immediately. For example, [EBP + EDX * 8 + 200] is a valid address expression. Of course, in most cases, it is not necessary to have such a complex relationship as inter-address, proportional factor, or offset.
The basic unit of memory is byte ). Each byte has eight binary bits. Therefore, the maximum number of characters that each word can save is 11111111, that is, 255 in decimal format. In general, it is more convenient to use the hexadecimal system, because every four binary bits are exactly equal to one hexadecimal bits, 111111b = 0xff. The bytes in the memory are stored consecutively. The two bytes constitute a word, and the two words constitute a DWORD ). In intel architecture, the small endian format is used, that is, in the memory, the high byte is behind the low byte. Example: The hexadecimal number is 803e7d0c. Each two bytes are in the same format as 0C 7d 3E 80 in the memory. In 32-bit registers, the format is normal, for example, 803e7d0c in eax. When our form address points to this number, it actually points to the first byte, that is, 0C. We can specify that the access length is byte, word, or double word. Assume that DS: [edX] points to the first byte 0C:
MoV Al, byte ptr ds: [edX]; store byte 0C to Al
MoV ax, word ptr ds: [edX]; Save the word 7d0c to ax
MoV eax, dword ptr ds: [edX]; save double-character 803e7d0c to eax
In the segment attribute, one is the default access width. If the default access width is double-byte (which is often the case in 32-bit systems), you must explicitly specify byte/word PTR to access byte or word.
Default segment Selection: If the instruction only has a valid address as the offset within the segment, but does not specify the segment, there are the following rules:
If EBP and ESP are used as base addresses or interaddresses, they are considered to be in the segments determined by the SS;
In other cases, it is considered to be in the segment specified by DS.
To break this rule, you must use a segment beyond the prefix. Example:
MoV eax, dword ptr [edX]; DS is used by default, and the dual word directed to DS: [edX] is sent to eax
MoV EBX, dword ptr es: [edX]; use ES: Segment beyond prefix to send the dual characters pointed to by ES: [edX] To EBX
STACK:
A stack is a data structure strictly called a "stack ". "Heap" is another similar but different structure. SS and ESP are Intel's hardware support for stack data structures. The push/pop command is specific to the stack structure. SS specifies a stack segment, and ESP indicates the top of the current stack. The push XXX Command performs the following operations:
Subtract 4 from the ESP value;
Saves XXX to the SS: Memory Unit pointed to by [esp.
In this way, the ESP value is reduced by 4, and the SS: [esp] points to the new XXX. Therefore, the stack is "Inverted long" and expanded from the high address to the low address. Run the pop YYY command to perform the opposite operation. Send the dual-character directed to SS: [esp] to the register or memory unit specified by YYY, and then add the value of ESP to 4. At this time, the value is deemed to have been popped up and will not be on the stack any more, because although it still exists at the top of the original stack temporarily, the next push operation will overwrite it. Therefore, data in memory units with a lower address than ESP in the stack segment is considered undefined.
Finally, it should be noted that the Assembly Language is machine-oriented, and the commands and machine codes are basically one-to-one, so their implementation depends on the hardware. Some seemingly reasonable commands do not actually exist, for example:
MoV DS: [edX], DS: [ECx]; cannot be directly transferred between memory units
MoV ds, 1a7; Segment registers cannot be directly assigned by the number immediately
MoV EIP, 3d4e7; the command pointer cannot be operated directly.