Learning Guide for assembly language (II.)

Source: Internet
Author: User
Tags access properties valid

The operands of the assembly instruction can be in-memory data, and how to get the program to correctly obtain the required data from memory is the addressing of memory.

INTEL's CPUs can work in two addressing modes: Real mode and protected mode. The former is outdated, it does not speak, WINDOWS is now 32-bit protection mode of the system, the PE file is basically running in a 32-bit linear address space, so here is only the introduction of 32-bit linear space addressing mode.

In fact, the concept of linear address is very intuitive, imagine a series of bytes into a long line, the first byte number is 0, the second number bit 1, .... Until 4294967295 (hexadecimal ffffffff, this is the maximum value that the 32-bit binary can express). This already has 4GB of capacity! Enough to hold all the code and data for a program. Of course, this does not mean that your machine has so much memory. The management and allocation of physical memory is a very complex content, the beginner does not care, in short, from the point of view of the program itself, it seems to be in such a large memory.

In Intel systems, memory addresses are always given in the way of segment selector: valid address. Segment Selectors (SELECTOR) are stored in a segment register, and valid addresses can be given in different ways. The segment selector determines the starting address of the segment by retrieving the segment descriptor, the length (also known as the segment limit), granularity, access rights, access properties, and so on. Do not delve into these, as long as you know the segment selector can determine the nature of the paragraph. Once the segment is determined by the selector, the valid address begins with the base site of the segment. For example, the data segment selected by the selector 1a7, whose base address is 400000, loads the 1a7 into the DS and determines that the data segment is used. ds:0 Point to linear address 400000. ds:1f5278 point to Linear address 5E5278. In general, we don't see or need to see the starting address of the segment, just care about the valid address in that section. In a 32-bit system, a valid address is also represented by a 32-digit number, which means, as long as one segment is sufficient to cover the 4GB linear address space, why should there be different segment selectors? As mentioned earlier, this is to access the data in a different nature. Illegal access will result in an abnormal interruption, which is the core of the protection pattern and the basis for constructing priority and multitasking systems. There are a lot of deep things involved, beginners can not ignore the first.

Valid addresses are calculated by: Base Address + Inter address * scaling factor + offset. These quantities are measured relative to the starting address of the paragraph in the paragraph and are not related to the starting address of the segment. For example, the base address = 100000, the =4 = 400, the scaling factor is the same, and the offset = 20000, the valid addresses are:

100000+400*4+20000=100000+1000+20000=121000. The corresponding linear address is 400000+121000=521000. (Note that all are hexadecimal numbers).

The base address can be placed in any 32-bit universal register, and the address can also be placed in any general register other than esp. The scaling factor can be 1, 2, 4, or 8. The offset is an immediate number. For example, [ebp+edx*8+200] is a valid, valid address expression. Of course, in most cases you don't need to be so complicated, the address, scaling factor, and offset do not necessarily appear.

The base unit of memory is bytes (byte). Each byte is 8 bits, so the maximum number of energy-saving expressions per word is 11111111, or 255 of the decimal. In general, it is convenient to use hexadecimal because every 4 bits is exactly equal to 1 hexadecimal digits, 11111111b = 0xFF. Bytes in memory are stored continuously, two bytes constitute a word (word), and two words form a double word (DWORD). In the Intel architecture, the small endian format is used, that is, in memory, the high byte is behind the low byte. Example: Hexadecimal number 803e7d0c, each two bits is a byte, in the form of memory is: 0C 7D 3E 80. In the 32-bit register is the normal form, as in EAX is 803e7d0c. When our form address points to this number, it actually points to the first byte, that is, 0C. We can specify that the access length is byte, word or double word. Suppose Ds:[edx] points to the first byte 0 C:

mov al, byte ptr ds:[edx]; storing byte 0C in AL
mov ax, word ptr ds:[edx]; put word 7d0c into ax
mov EAX, DWORD ptr Ds:[edx]; put the double word 803e7d0c into EAX

One of the properties of a segment is the default access width. If the default access width is two words (often in a 32-bit system), then the byte or word access must be explicitly indicated with Byte/word ptr.

Default Segment selection: If the instruction is only valid addresses that are offset within a paragraph, but not specified in which paragraph, there are the following rules:

If EBP and ESP are used as base or inter addresses, they are considered to be in the section determined by SS;
Other cases are considered to be in the section identified by the DS.

If you want to break this rule, you must use a paragraph beyond the prefix. Examples are as follows:

mov eax, DWORD ptr [edx]; By default, the DS is used and the two words pointed to Ds:[edx] are sent to EAX
mov ebx, DWORD ptr Es:[edx], using ES: Segment beyond prefix, Es:[edx] to point to the two words into ebx

Stack:

A stack is a data structure, strictly known as "stack". "Heap" is another similar but different structure. SS and ESP are hardware support for this data structure of Intel's stack. The PUSH/POP directive is specific to the stack structure. SS Specifies that a segment is a stack segment and ESP indicates the current top of the stack. The Push XXX instruction is as follows:

Subtract the value of ESP by 4;
Store xxx in Ss:[esp] point to the memory unit.

In this way, the value of ESP is reduced by 4, and Ss:[esp] points to the newly pressed XXX. So the stack is "upside down", extending from a high address to a low address. The pop yyy instruction does the opposite, sending the SS:[ESP to the YYY specified register or memory unit, and then adding the value of ESP to 4. At this point, think that the value has been ejected, no longer on the stack, because although it still exists at the top of the original stack position, but the next push operation will cover it. Therefore, the data in the memory cell with the address less than ESP in the stack segment is considered undefined.

Finally, one thing to note is that assembly language is machine-oriented, and instruction and machine code are basically one by one corresponding, so their implementation depends on the hardware. There are some seemingly reasonable directives that do not actually exist, such as:

mov Ds:[edx], ds:[ecx]; no direct transmission between the internal deposit
mov DS, 1A7 segment registers cannot be assigned directly from immediate numbers
mov EIP, 3d4e7; the instruction pointer cannot be manipulated directly.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.