Disclaimer (mandatory reading). ): All the tutorials provided by this blog are translated from the Internet, only for the purposes of learning and communication, not for commercial communication. At the same time, do not remove this declaration when reproduced. If any dispute arises, it has nothing to do with the owner of the blog or the person who publishes the translated manuscript. Thank you for your cooperation.
If you plan to build your own operating system, you will need to familiarize yourself with assembly programming, and once you understand an assembly language, you may even use it to write a complete OS, no matter how you choose, this blog will introduce you to X86-64 assembly language, the nearest machine-coded language
preparatory work
Before we start, you need a x86_64 Linux machine, and you've installed the NASM program, which I think you can download and install. You can install it successfully by entering the following command directly at the terminal.
x@x-macbook:~/asm-tutorial$ sudo apt install nasm
NASM full Name The Netwide assembler, is an assembly language compiler based on 80x86 and x86-64 platform
"Hello world!"
As with the start of most program languages, we'll start with a very basic Hello World program, and I'll show you the code, and I suggest you do it manually, and don't copy and paste it to better remember it,
First, let's create a directory to store our working files
$ mkdir asm-tutorial
$ cd asm-tutorial
$ gedit hello-world.asm
In the example above, I opened the hello-world.asm with Gedit, a handy, general-purpose text editor, but if you prefer emacs,vim or other text editors, you're free.
OK, now we're going to enter code for our Hello World program, and I'll explain how the code works when you're done and successfully compiled and run.
[Bits]
Global _start
section. Data message
db "Hello, world!"
Section. Text
_start:
mov rax, 1
mov rdx,
mov rsi, message
mov rdi, 1
syscall
mov rax, 60
mov rdi, 0
Syscall
To create an executable file
Once you have finished typing, save the file, and then enter the following command at the terminal.
$ nasm-f elf64 hello-world.asm
$ ld hello-world.o-o hello-world
$./hello-world
Hello, world!
The first line nasm-f elf64 Hello-world.asm tells the NASM program to assemble our files, F-elf64 is to show that we want NASM to generate a target file in elf64 format.
NASM as we know is assembler, assembler is a file written in assembly language, Just as our hello-world.asm converts to machine code, machine code tells the computer what to do, the NASM generated file is called the target file, in our little example, NASM produces a file called HELLO-WORLD.O.
We use the Hexdump tool to look at the contents of our target file.
x@x-macbook:~/asm-tutorial$ hexdump hello-world.o 0000000 457f 464c 0102 0001 0000 0000 0000 0000 0000010 0001 003e 0001 0 000 0000 0000 0000 0000 0000020 0000 0000 0000 0000 0040 0000 0000 0000 0000030 0000 0000 0040 0000 0000 0040 0007 0003-00 00040 0000 0000 0000 0000 0000 0000 0000 0000 * * 0000080 0001 0000 0001 0000 0003 0000 0000 0000 0000090 0000 0000 0000 000 0 0200 0000 0000 0000 00000a0 000d 0000 0000 0000 0000 0000 0000 0000 00000b0 0004 0000 0000 0000 0000 0000 0000 0000 0000 0c0 0007 0000 0001 0000 0006 0000 0000 0000 00000d0 0000 0000 0000 0000 0210 0000 0000 0000 00000e0 0027 0000 0000 0000 00 0000 0000 0000 00000f0 0010 0000 0000 0000 0000 0000 0000 0000 0000100 000d 0000 0003 0000 0000 0000 0000 0000 0000110 0000 0000 0000 0000 0240 0000 0000 0000 0000120 0032 0000 0000 0000 0000 0000 0000 0000 0000130 0001 0000 0000 0000 0000-0 000 0000 0000 0000140 0017 0000 0002 0000 0000 0000 0000 0000 0000150 0000 0000 0000 0000 0280 0000 0000 0000 0000160-0090 0000 0000 0000 0005 0000 0005 0000 0000170 0004 0000 0000 0000 0018 0000 0000 0000 0000180 001f 0000 0003 0000 0000 0000 0000 0000 0000190 0000 0 0000 0000 0310 0000 0000 0000 00001a0 0020 0000 0000 0000 0000 0000 0000 0000 00001b0 0001 0000 0000 0000 0000 0000 00 00 0000
00001c0 0027 0000 0004 0000 0000 0000 0000 0000
00001d0 0000 0000 0000 0000 0330 0000 0000 0000
00001e0 0018 0000 0000 0000 0004 0000 0002 0000
00001f0 0004 0000 0000 0000 0018 0000 0000 0000
0000200 6548 6c6c 2c6f 5720 726f 646c 0021 0 0000210 01b8 0000 ba00 000d 0000 be48 0000 0000 0000220 0000 0000 01bf 0000 0f00 b805 003c 0000 0000230 00bf 0000 0f00
0005 0000 0000 0000 0000 0000240 2e00 6164 6174 2e00 6574 7478 2e00 6873 0000250 7473 7472 6261 2e00 7973 746d 6261 2e00 0000260 7473 7472 6261 2e00 6572 616c 742e 7865 0000270 0074 0000 0000 0000 0000 0000 0000 0000 0000280 0000 0000 0000 000 0 0000 0000 0000 0000 0000290 0000 0000 0000 0000 0001 0000 0004 fff1 00002a0 0000 0000 0000 0000 0000 0000 0000 0000 0000 2b0 0000 0000 0003 00010000 0000 0000 0000
00002c0 0000 0000 0000 0000 0000 0000 0003 0002
00002d0 0000 0000 0000 0000 0000 0000 0000 0000
00002e 0 0011 0000 0000 0001 0000 0000 0000 0000 00002f0 0000 0000 0000 0000 0019 0000 0010 0002 0000300 0000 0000 0000 0000 0000 0000 0000 0000310 6800 6c65 6f6c 772d 726f 646c 612e 6d73 0000320 6d00 7365 6173 6567 5f00 7473 7261 0074 0000330 00
0c 0000 0000 0000 0001 0000 0002 0000 0000340 0000 0000 0000 0000 0000 0000 0000 0000-0000350
See, the machine code is determined for the machine, not for the person to see.
The last step to running this program is linking, linked, which is done by the system linker, on Linux, this tool is called ld,linking is to convert the target files into executable files.
-O hello-world This option is to tell the LD that we want to generate an executable file named Hello-world.
Finally we run our program by adding "./" before our filename, and the program returns Hello, world!
The execution process of CPU instructions
Before we dive into the process of analyzing our programs, it's good to know if the CPU is performing well, in general, the purpose of the CPU is to perform a meaningful sequence of instructions, usually divided into four steps, fetching, decoding, executing, and writing back
1. Finger-taking
The first step, refers to, contains an instruction from the program memory, the location of the instruction in memory by the program counter (PC) specified, once an instruction is taken, the program counter will refer to the next instruction.
2. Decoding
Decoding, determines what the CPU will do, the instructions are usually divided into two parts, the opcode, the operation code specified, the remaining part by providing the request to perform the operation of information, may be a constant, register or memory address, (operand reference)
3. Implementation
When the previous step is complete, the steps are started, and the parts of the CPU are interconnected so that they can perform the actions specified by the opcode. Then the operation executes
4. Write Back
The last step, write back, just write the results of the execution to a storage address, such as registers or memory address, not all instructions have output value, some instruction operation program counter, these address is called jump instruction, jump instruction makes loop, conditional statement and function call becomes simple.
Understanding Registers
Register is the CPU internal small capacity of memory, we are concerned about the main three types of registers, data registers, address registers and universal registers. Data registers hold numeric values, such as integers and floating-point values, address registers, which store the address universal registers in memory, and can be used either as data registers or as address registers.
Most of the work of assembler programmers is to manipulate these registers.
Analysis of our source code
With the above background, we are now analyzing our code, and I will break the program into small sections and explain what each step is doing.
[Bits 64]
The first line of our program is an assembler instruction, as you can guess, which tells NASM we want to get code that runs on a 64-bit processor.
Global _start
This line is another assembly instruction that tells NASM that the code snippet (section) marked with _start should be considered global, and that the global part usually allows other object files to refer to it, and in our case we mark the _start segment as Global, So that the linker knows where our program starts.
Section. Data message
db "Hello, world!"
The first line of the above code is also a compilation instruction that tells NASM that the following code is a data segment, and that the data section contains global and static variables.
In the next sentence, we have such a static variable. Message db "Hello, world!", DB is used to declare initialization data, message is a variable name, and "Hello, world!" Association.
Section. Text
The above sentence is another section, but this time it tells NASM to save the following code to the text segment, which is sometimes called the code segment, which is part of the target file that contains the executable code.
Finally, we are going to the most important part of the program.
_start:
mov rax, 1
mov rdx,
mov rsi, message
mov rdi, 1
syscall
mov rax,
mov rdi, 0
Syscall
The first line. _start: The code behind it is associated with the _start tag.
mov rax, 1
mov rdx,
mov rsi, message
mov rdi, 1
The above four lines are loaded into different registers, Rax and RDX are general-purpose registers, we use them to save 1 and 13,rsi and RDI are source and target data index registers, we set the source register RSI point to Message, and the target registers point to 1.
Now, when the register is loaded, we have the syscall instruction, which tells the computer that we want to perform a system call using the value we have loaded into the register, the first number we load is the value of the Rax register, telling the computer which system call we want to use, Syscalls table and its corresponding number can be queried here
As you can see from that table. The 1 in Rax means we want to call write (int fd, const void* buf, size_t bytes). The next instruction, MOV rdx,13, is the last parameter value of the function write () that we want to call, and the last parameter size_t bytes the size of the message, where 13 is "Hello, world!." The length,
The next two instructions mov RSI, message and MOV rdi,1 were made for the other two parameters, so when we put them together, when we were going to execute syscall, we were telling the computer to execute write (1, message, 13), 1 is the standard output stdout, so essentially, we tell the computer to fetch 13 bytes from the message to stdout.
mov rax,
mov rdi, 0
Syscall
Now, maybe you want to know that we have done all the functions, why there is a syscall, you do not have to guess, directly to check the table mentioned above we know, 60 reference exit (). Because, Syscall is actually calling exit (0).
To sum up
So, the first part of our assembler program is to put the message variable and the "Hello, world!" Then, the key part of the program is to write two syscall, one is write (), the other is exit (), put them together, if you want to translate the assembler into C language, it will probably look like this.
int main ()
{
char* message = "Hello, world!"
Write (1, message, n);
Exit (0);
}
Copyright statement: This article by http://www.cnblogs.com/lazycoding Translation, welcome to reprint and share. Please respect the labor of the author, reprinted when the statement and the author blog link, thank you.
Good things should be shared.