My compilation of Learning Pathways (2): Key Terms and concepts

Source: Internet
Author: User

Many of the concepts appearing in the first part may not be very clear to the novice assembler, so I decided to write more valuable articles. So, let's start with the second part of My Learning Path of compilation.

Terminology and concepts

When I wrote the first one, I got a lot of feedback from different readers, and some parts of the first part didn't understand, which is why this article and the next few words start with a description of some terms.

Register: Registers are small-capacity storage structures within the processor, the main function of the processor is data processing, the processor can get the information from memory, but this is a slow operation, which is why the processor has its own data storage structure, called "register".

L Small End (Little-endian): We can assume that memory is a large array that contains a byte-by-byte number. Each address stores an element in the "array" of memory, one byte per element. For example, we have 4 bytes: AA, the lowest bit in the small-end mode is stored on the low address:

1 2 3 4 0 FF 1 AB 2 3 AA

Here, 0, 1, 2, 3 are memory addresses.

Big- endian (Big-endian): End stores data in the opposite direction. All the above byte-order in the big-endian mode is:

1 2 3 4 0 AA 1 2 AB 3 FF

system call (Syscall): A system call is a way that a user program requires the operating system to do some work for it. You can find the system call table here.

Stack : The number of registers in the processor is very limited. So the stack is a contiguous memory space that can be addressed by special registers such as RSP, SS, RIP, and so on. In the next article I will specialize in the introduction of stacks.

Section : Each assembler is composed of fields with the following segments:

    • data--the data or constants used to declare initialization
    • bss--used to declare uninitialized variables
    • text--used to store code

Universal Register (general-purpose register): There are 16 general-purpose registers: Rax, RBX, RCX, RDX, RBP, RSP, RSI, RDI, R8, R9, R10, R11, R12, R13, R14, R15.

Of course this is not all the terms and concepts associated with assembly language, and if you encounter strange unfamiliar words in the next article, we will explain the meanings of these words.

Data type

The basic data types are: Byte (bytes), Word (words), double Word (doublewords), four words (duadwords), and double four words (double dualwords), with lengths of 8 bits, 2 bytes, 4 bytes, 8 bytes, 16 bytes (128 bits).

Now we only use integers, so just look at the representation of it. There are two types of integral type: unsigned and signed. An unsigned integer is a byte, word, double word, four-word unsigned binary number that can be represented in the following ranges: 0~255, 0~65,535, 0~2^32-1, 0~2^64-1. A signed integer is a signed binary number that is a byte, a word, a double word, and a four-word representation. The sign bit is set at negative numbers, and is zeroed when positive and 0 are in place. Integer can be expressed in the range: 1 bytes -128~127,1 characters -32,768~32,767,1 Double word -2^31~2^31-1,1 four words -2^63~2^63-1.

Paragraph

As I mentioned above, each assembler is composed of fields that contain data segments, code snippets, and BSS segments. Let's take a look at the data segment, which is the constant that is used primarily to define initialization. For example:

1 2 3 4 Section. Data num1:equ num2:equ-msg:db "Sum is correct", 10

Okay, it's almost clear here. The three constant names are NUM1, num2, and MSG, respectively, with values of 100, 50, and "Sum is correct", 10. But what about DB and equ? In fact, NASM supports a large number of pseudo-directives:

    • DB, DW, DD, DQ, DT, do, DY, and dz--are used to define initialization data. For example:
1 2 3) 4 5 ;; Initialize 4 bytes 1h, 2h, 3h, 4h db 0x01,0x02,0x03,0x04; Initialize Word to 0x12 0x34 DW 0x1234
    • Resb, RESW, Resd, RESQ, REST, Reso, RESY, resz--are used to define non-initialized variables
    • incbin--contains an external binary file
    • equ--defines constants, such as:
1 2 ;; Now one is 1 one equ 1
    • times--repeating instructions or data (described in the next article)
Arithmetic operations

Here is a simple list of arithmetic operation instructions:

    • add--integer Plus
    • sub--minus
    • mul--unsigned multiply
    • imul--signed Multiply
    • div--unsigned except
    • idiv--have symbols in addition to
    • inc--Self-increment
    • dec--Self-reduction
    • neg--Reverse

This article will use some of the others in the next article has covered.

Control flow

It is common for programming languages to use if, case, Goto, and so on to change the order in which programs are run, and of course the assembly can. Here we refer to some. There is a CMP directive specifically designed to compare two numbers, which is then used to determine whether a jump is to be followed by a conditional judgment instruction. For example:

1 2 ;; Compare Rax with Cmprax, 50

The CMP directive compares only two numbers, but has no effect on their values and does not perform anything based on the results of comparisons. In order to perform an action after the comparison, a conditional jump instruction can be one of the following:

    • je--if equal
    • jz--if the zero
    • jne--If not equal
    • jnz--If not zero
    • jg--if the first operand is larger than the second one
    • jge--if the first operand is larger or equal than the second one
    • The ja--is the same as the JG directive, except that the unsigned number is compared
    • The jae--is the same as the JGE directive, except that the unsigned number is compared

For example, if we want to write a if/else-like statement in the C language:

1 2 3) 4 5 if (rax! =) {exit ();} else{right ();}

This is the case in the assembly:

1 2 3) 4 5 ;; Compare Rax with Cmprax, 50; Perform. Exitifrax is not equal-jne. Exit jmp. Right

There is also an unconditional jump instruction syntax:

1 JMP LABEL

For example:

1 2 3 4 5 6 7 8 9 10 _start:;; ....     ;; DoSomething and jump to. Exitlabel;; ..... Exit:mov Rax, 0 mov rdi, syscall

Here there are some code after the _start tag, the code will be executed, the final control of the assembly is turned to the. Exit label, and the code after the tag begins execution.

Usually the unconditional jump used in the loop, for example we have a label tag, it has some code behind it, the code after the execution of the conditional judgment, if the condition is not established will jump to the beginning of the code. The loop is described in the following article.

Example

Let's look at a simple example: Add two numbers, get their sum, and then compare them to a predefined number if they output something equal to the screen; Here is the source code for the example:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 ; initialised data Section section. Data     ; Define Constants     num1:   equ     num2:   equ 50 & nbsp;   ; Initialize message     msg:    db ' Sum is correctn '   section. Text        global _start  ; Entry point _start:     ;setnum1 's value to Rax     mov Rax, Num1   & Nbsp; ;setnum2 ' s value to RBX     mov RBX, num2     ; Getsumof Rax and RBX, and store it ' s Valueinrax     add Rax, RBX     ; Compare Rax and     cmprax,     ; Go to. Exitlabelifrax and equal     jne. Exit     ; Go to. rightsum Labelifrax and Equal     jmp. Rightsum  ; Print message Thatsumis correct. Rightsum:     ;; Write Syscall     mov     Rax, 1     ;; Filedescritor, standard output     mov     RDI, 1     ;; Message address     mov     RSI, msg     ;; Length of message     mov     RDX,     ;; Call Write Syscall     syscall     ;exitfrom program      jmp. Exit   exitprocedure. Exit:     ;exitsyscall     mov     Rax,     ;exitcode     mov    rdi, 0      ; Callexitsyscall     syscall

Let's have a look at this piece of code. First, three numbers are defined in the data segment: NUM1, num2, and MSG with the value "Sum is Correctn". Now see line 14th, which is where the program is at the entrance. We put the values of the NUM1 and num2 in the General register Rax and RBX, and add them together, and after the add instruction is executed, Rax and RBX are added and saved to Rax, which is now NUM1 and num2 and stored in the Rax register.

Well, we let NUM1 be 100,num2 is 50, and the sum is 150, compared with CMP directives. After comparing Rax and 150, check the results of comparison, if Rax and 150, we jump to. Exit, if equal, jump to the. Rightsum tab.

Then there are two tabs:. Exit and. Rightsum. First set the Rax to 60, which is the exit system call number, and set RDI to 0, which is the exit code. Then,. Rightsum is fairly simple, just print out the sum is CORRETN, and if you don't understand how it works, look at the first article.

Summarize

This is the second article in the "My compilation Learning Road" series, if you have any questions or suggestions, leave me a message.

My compilation of Learning Pathways (2): Key Terms and concepts

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.