Linux Lakes 08: Write 16-bit code that can run in x86 real mode using GCC and GNU binutils

Source: Internet
Author: User

Admittedly, this time the title is a bit long. The title is written in such detail, mainly for the search engine can accurately put the need to understand the GCC generation 16 real-mode code method of the friend brought to my blog. Let's start with the background and write 16-bit code that runs in x86 real mode, which is really a bit retro, so there's less data to find. To run the x86 real-mode program, I now know only two ways, one is to use a DOS system, the other is to write it as a boot sector code, the system starts directly run. Obviously, many speak of their own implementation of the operating system of books will talk about x86 real mode, but also only their own implementation of the operating system to guide friends need to use x86 real mode, so I read this article is certainly very few users, although I think it fills the online on the topic of the lack of relevant information gaps. Therefore, all the friends who visit my article, please click on the recommendation, thank you.

Why does this blog fill the blanks of the topic? That is because whether it is those who write books, or write articles on the Internet, once you need to write 16-bit real-mode code, all like to take nasm say things, a little disregard of the GNU as feeling. Of course, this is a historical reason, because Linux since its inception is the 32-bit, is a multi-user multitasking operating system, so the GCC and GNU as a migration to Linux is used to write 32-bit protection mode of code. Moreover, Elf executable file format is only ELF32 and ELF64, have not heard of ELF16. Even Linux itself, when it was first born (1991), only used the AS86 assembler to write its own 16-bit boot code, until 1995, when GNU as gradually joined the ability to write 16-bit code.

Below start my gcc and GNU binutils 16-bit Code tour. I decided to use DOS as my test environment, so the last executable I generated made it into a plain binary format that could be run in a DOS system. The first step is to install a QEMU virtual machine to run FreeDOS, and installing the virtual machine in Ubuntu requires only a sudo apt-get install QEMU command to complete, so I can't. However, the FreeDOS floppy image file needs to be downloaded to Qemu's official website, such as:

Use QEMU-SYSTEM-I386-FDA freedos.img to run QEMU virtual machines and FreeDOS systems, such as:

Because assembly language is closer to the bottom, and the C language is more advanced, it begins with assembly language and gradually transitions to c. First write a simple, can display in DOS a "hello,world! "Assembly language program, considering I will later use the program to invoke the C language of the main function, and the program is responsible for the end of the program to return to the DOS system, so I named this program TEST_CODE16_STARTUP.S. The code is as follows:

Here is a simple explanation of the above code:

1. The assembly language used by the GNU as assembler uses the-t syntax, which differs from the Intel syntax. I prefer the grammar of at-the-T, for two reasons, the first is that at/T grammar is a common standard in the Linux world, and the other is that the AT/t syntax is actually easier to understand in some concepts (such as memory addressing mode). There are assembly language basis of the people, at and T Grammar quickly, mainly have the following: ① assembly instructions followed by the length of the operation of the suffix, such as MOV instructions, if the operand is 8 bits, then with MOVB, if the operand is 16 bits, then with MOVW, if the operand is 32 bits, then use MOVL , if the operand is 64 bits, then with MOVQ, the rest of the instructions, and so on; the ② operand is in the order of the source operand, the target operand is after it, such as MOVW%cs,%ax means moving the data in the CS register to the AX register, which is the reverse of the Intel assembler syntax. ③ all registers use% prefixes, such as%ax,%di,%ESP, etc. ④ for immediate numbers, you need to use$prefixes, such as$4,$0X0C, and if a number starts with 0, then it is 8 binary, starting with another number, is 10 binary, with the beginning of 0x is 16, the label when the number of immediate use, need$Prefixes, such as the PUSHW above$Message, and the label is not required when the function name is used$Prefixes, such as the above CALLW Display_str;⑤ memory addressing method, we all know, x86 addressing, what direct addressing, indirect addressing, base address, address addressing and so on, and so on, and at the at-and-t syntax for memory addressing way to do a good unification, Its format issection:displacement(base,index,scale), where section is the segment address, displacement is the offset, base is the base address register, index is the indexes, scale is the scaling factor, it is calculated as a linear address =section + displacement + base + index* Scale, most importantly, you can omit one or more parts of the above format, such as MOVW 4,%ax is to move the value in memory address 4 into the AX Register, MOVW 4 (%ESP),%ax is to move the value of the esp+4 point in the address to the AX register, and so on. Is my presentation above the most concise and easy-to-compile grammar tutorial for the entire network?

2. All I use in the above code are 16-bit instructions, such as MOVW, PUSHW, CALLW, and so on, and define the string "Hello" directly in the code, world! ”。

3. Using function Display_str in the above code, before calling Display_str, I use PUSHW $15 and PUSHW $message to press the argument from right to left, The function is then invoked using the CALLW directive, which is the same as the C language function calling convention. The call to the CALLW instruction automatically pushes the%IP register, and at the beginning of the function I use the PUSHW%bp to stack the%BP register, so%esp moves down by 4 bytes, so you can access both parameters using 0x4 (%ESP) and 0x6 (%ESP) in the function. In 32-bit code, because the%EIP and%EBP are stacked when the function is called, it is necessary to use 0x8 (%ESP) and 0xc (%ESP) to sequentially access the parameters of the stack. I have a good book here about the details of assembly-language function calls. The Linux Assembler Programming Guide. pdf. This is a free English ebook, formerly known as "programming from the ground up".

4. The above code uses the BIOS interrupt int 0x10 to output the string, using DOS interrupt int 0x21 to return to the DOS system.

5. Most importantly, use the. code16 directive to have the assembler assemble the program into 16-bit code.

Once the code is complete, it can be assembled, linked, and converted into a pure binary format (Plain binary) under DOS using the following sequence of commands, which is then copied to freedos.img, executed FreeDOS with the QEMU virtual machine, and then run the 16-bit real-mode program. This sequence of commands and how it works is as follows:

The more important options in these commands have been deliberately marked out. Since I am using a 64-bit environment, I need to specify the--32 option when calling the As command, and I need to specify the-m elf_i386 option when invoking the LD command. When the above option is specified, a 32-bit elf target file is generated, otherwise the 64-bit elf target file is generated by default, and if the target file is 64-bit, there will be a problem when connecting to the C language-generated target file. Friends who use 32-bit environments do not have to specifically specify these two options. Since the DOS system always loads the plain binary file into the 0x100 address, it is necessary to specify the-ttext 0x100 option when calling the LD command. After the LD command executes, it generates an elf-formatted executable file, Test.elf, and finally calls Objcopy to generate a pure binary,-j. The text option means that only the code snippet is needed because I put "Hello, world!" is also defined in the code snippet, the-o binary option specifies that the output format is a pure binary file and the output file is test.com. Finally, mount the freedos.img image file into Ubuntu, copy the test.com into it, then umount, then run the virtual machine and run test in DOS to see the effect.

In addition to other programs in the AS and Ld,gnu binutils, it is also a good helper to write programs and analysis programs. You can use Readelf-s to view all the segments in the Test.elf file, or you can use the Objdump-s command to enter the data in the Test.elf in 16 binary form, such as:

Of course, you can also use objdump-d or objdump-d to disassemble the program to see if 16-bit code is actually generated, such as: (Be sure to specify the-m i8086 option when disassembling)

You can also disassemble a file in a pure binary format, and you must specify the-B binary option, for example, to disassemble the test.com:

When disassembling, be sure to specify the-m i8086 option, otherwise objdump does not know that disassembly is a 16-bit code. (as mentioned earlier, Linux is 32-bit from birth, so Elf has only 32 and 64-bit, no 16-bit elf format.) For example, if you disassemble using the-m i386 option, the disassembly result will be unintelligible:

Enter the world of the C language below. To find out what's special about the 16-bit code compiled by C language, write a simple C-language program to investigate, such as:

The program has the following characteristics:

1. The beginning of the program uses __asm__ (". code16\n") to embed assembly instructions to indicate that the as generates 16-bit code;

2. The signature of the DISPLAY_STR function is the same as in assembly language, and it can be used to observe how the C-language generated code passes parameters.

Use the following command to compile and disassemble the program, such as:

As can be seen, the C language generated code, although the 16-bit, but it has the following features: ① from the generated display_str function can be seen, the function is the start of the push%EBP, instead of the push%bp;② in the DISPLAY_STR function to obtain the parameters of the location of 0x8 ( %EBP) and 0xc (%EBP), rather than the 0x4 (%EBP) and 0x6 (%EBP) I wrote in assembly language, ③ from the generated main function can be seen, before calling Diaplay_str, not using the push command to stack parameters, but directly through the sub $0x18 ,%esp adjusts the position of the%ESP, then uses the MOV instruction to place the parameters in the specified position, and the same effect as using the push instruction; ④ Although I deliberately defined the length parameter as short in the definition of the DISPLAY_STR function, However, you can see from the generated code that a parameter is still placed every 4 bytes.

It is also necessary to note that, in addition to specifying the-C option when calling GCC, it is necessary to specify the-M32 option so that only the 32-bit assembly code is generated, and only the 32-bit assembly code is used in the. code16 directive to compile the 16-bit machine code. If you do not specify the-M32 option, a 64-bit assembly code is generated, and then an error is compiled. With the-M32 option, the resulting target file is in the ELF32 format. The destination file in the ELF32 format can only be connected to the destination file in the ELF32 format, which is why the previous as and LD need to specify the--32 and-M elf_i386 options.

Through the above analysis, it seems to be possible to draw the following conclusions: only need to change the assembly code PUSHW%BP to PUSHL%ebp, and then adjust the location of the obtained parameters to 0x8 (%EBP) and 0xc (%EBP), you can successfully call from the C language to the function in assembly language. In fact, there is a little bit of a gap. As can be seen from the disassembly code above, the function is called using the 16-bit call instruction, the command stack is%IP, rather than%EIP, and the C language generated by the function frame is calculated by the%EIP compression stack, the difference between them two bytes.

In order to prove the accuracy of my above judgment, I will revise the above C language Program and assembler program, compile the link into a complete program to see if it can run correctly. Such as:

C Language Program modification is very simple, is to remove the implementation of the DISPLAY_STR function, only the declaration. assembly code such as:

Assembly language changes include the following: Export the DISPLAY_STR function, change the PUSHW%bp to PUSHL%EBP, and modify the position of the get parameter. The instructions for compiling, connecting, and running the program are as follows:

You can see that "Hello World from C language" is not displayed correctly. The above commands are used in the previous, do not need to explain, the only difference is the use of C language to write a program more than one. Rodata, so you need to include this paragraph in objcopy.

Since the C language-generated function framework takes arguments starting with 0x8 (%EBP), it thinks that 0x0 (%EBP) is an old ebp,0x4 (%EBP) is%EIP, and in fact, when a function is called with a 16-bit call instruction, 0x4 (%EBP) is%ip instead of%EIP. So start taking parameters from 0x6 (%EBP). It is not possible to modify the C language-generated function framework to see if we can change the 16-bit call to 32-bit call.

The way to do that is to not use the. Code16, while the. code16gcc. CODE16GCC and. Code16 differ in that it generates the assembly code that generates 32-bit machine code when using commands such as Call, RET, and jump. Equivalent to Calll,retl,jumpl. This is also the reason why CODE16GCC called. CODE16GCC because it is used in conjunction with the function framework that GCC generates.

To change the code below, C language code modification is very simple, only need to change the. Code16 to. CODE16GCC, such as:

With disassembly, you can see that it uses 32-bit calll and retl, such as:

The main modification of the assembler is to change the. Code16 to. CODE16GCC, then manually change the CALLW to Calll and RETW to Retl, such as:

Finally, compile the connection, copy it to Freedos.img, run the virtual machine, and see how it works, such as:

It's done, it works like.

Summarize:

Writing 16-bit code running in x86 real mode is a very retro topic, and writing a plain binary executable that can run under DOS is a much more retro topic. In the past, the author liked the NASM to program when it was necessary to use x86 's 16-bit real mode. For example, "30 days of homemade operating system", "Orange's an operating system implementation", "x86 assembly language-the actual mode to protect mode" and other books are the NASM assembler and Intel assembly syntax as an example. And they are all in the 32-bit protection mode before the assembly language and the C language to work together.

I use the Linux operating system, so I just want to use GCC and GNU as, whether I write 32-bit code or 16-bit code. I also think that even in the 16-bit mode, you can try to use less assembly language, more C language. After the efforts, with the above article. The process of writing 16-bit code running in x86 real mode using GCC and GNU binutils is as follows:

1. If only 16-bit programs are written in assembly language, Use the. code16 directive and ensure that only 16-bit instructions and registers are used; If you want to work with the C language, use the. CODE16GCC directive, and use the Pushl,calll,retl,leavel,jmpl in the function framework, using 0x8 (%EBP) Start accessing the parameters of the function; Obviously, a program that uses C and assembly language can run in real mode, but not on a real CPU before 286 because the CPU before 286 does not have PUSHL, calll, Retl, Leavel, Jmpl, and so on.

2. When using as, specify the--32 option, specify the-M32 option when using GCC, and specify the-m elf_i386 option when using LD. If you are disassembling the 16-bit code, use the-m i8086 option when using Objdump.

3. The. com file running in DOS will be loaded into 0x100, so you need to specify the-ttext 0x100 option when using the LD connection, and the code for the boot sector will be loaded to the 0X7C00, so you need to specify the-ttext 0x7c00 option when using the LD connection.

4. The programs generated using GCC, as, and LD are in ELF format by default, while the. com program running under DOS is plain binary, and the code running in the boot sector is plain binary, Therefore, you need to use Objcopy to copy the code snippets and data segments in the Elf file into a Plain binary file, using the-o binary option, Plain binary files can be disassembled, and the-B binary option is specified when using Objdump.

(Jingshan Ranger in 2014-08-24 published in the blog Park, reproduced please indicate the source. )

Linux Lakes 08: Write 16-bit code that can run in x86 real mode using GCC and GNU binutils

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.