AT/T Assembly format

Source: Internet
Author: User

Because in the Linux kernel, a lot of contact with the underlying hardware use assembly language, but Linux not only use an assembly language, in addition to Intel's assembly language, or at the AT-T assembly language, so it can be said that these two are a foundation, Intel's Assembly believes that many computer-learning people have studied, but at the AT-T is not necessarily, the individual think their thoughts are the same, but the grammar is different, the primary learning can read the following articles (article transfer from http://blog.chinaunix.net/u1/ 59572/showart_1148334.html)

I./t format Linux compilation syntax format
  
In the/T assembly format, the register name is prefixed with '% ', whereas in the Intel assembler format, the register name does not need to be prefixed. For example:
/t format
Intel format

PUSHL%eax
Push EAX

  
In the/T assembly format, a ' $ ' prefix is used to denote an immediate operand, whereas in the Intel assembler format, the immediate number representation is not prefixed with any prefix. For example:
/t format
Intel format

PUSHL $
Push 1

  
The number of source and target operands in the-T and Intel formats is exactly the opposite. In the Intel assembler format, the target operand is to the left of the source operand, while in the/T assembly format, the target operand is to the right of the source operand. For example:
/t format
Intel format

Addl $,%eax
add eax, 1

  
In the/T assembly format, the word length of the operand is determined by the last letter of the operator, with the suffix ' b ', ' W ', ' l ' representing the operands as bytes (byte,8 bits), words (word,16 bits), and long words (long,32 bits), and in the Intel assembly format, The length of the operand is expressed as a prefix such as "byte ptr" and "word ptr". For example:
/t format
Intel format

Movb Val,%al
mov al, byte ptr val

  
In the/T assembly format, the absolute transfer and invocation instructions (Jump/call) are preceded by a ' * ' as a prefix and are not required in the Intel format.
The operation codes for remote transfer instructions and remote sub-invoke instructions are "ljump" and "Lcall" in the/T assembly format, while in the Intel assembler format are "jmp far" and "call far", i.e.:
/t format
Intel format

Ljump $section, $offset
JMP far Section:offset

Lcall $section, $offset
Call Far Section:offset

  
The corresponding remote return instruction is:
/t format
Intel format

Lret $stack _adjust
RET far Stack_adjust

  
In the/T assembly format, the memory operand is addressed in the form
Section:disp (base, index, scale)

  
In the Intel assembler format, the memory operand is addressed in the following way:
Section:[base + Index*scale + disp]

  
Since Linux works in protected mode with a 32-bit linear address, it does not take into account the segment base and offset when calculating the address, but instead uses the following address calculation method:
DISP + base + Index * scale

  
Here are some examples of memory operands:
/t format
Intel format

Movl-4 (%EBP),%eax
mov eax, [ebp-4]

MOVL Array (,%EAX, 4),%eax
mov eax, [eax*4 + array]

MOVW Array (%EBX,%eax, 4),%CX
mov cx, [ebx + 4*eax + array]

Movb $4,%fs: (%EAX)
MOV fs:eax, 4

  
Second, Hello world!
  
Since the first example of all programming languages is to print a string "Hello world!" on the screen, we also begin to introduce the assembly language programming under Linux in this way.
  
In the Linux operating system, you have many ways to display a string on the screen, but the simplest way is to use the system calls provided by the Linux kernel. The biggest benefit of using this approach is that you can communicate directly with the operating system's kernel, do not need to link libraries such as libc, and do not need to use the ELF interpreter, so the code size is small and execution speed is fast.
  
Linux is a 32-bit operating system running in protected mode, using flat memory mode, the most commonly used is the ELF format of binary code. An ELF-formatted executable program is typically divided into the following sections:. Text,. Data, and. BSS, where. Text is a read-only code area,. Data is a readable, writable, and. BSS is a data area that can be read-write and not initialized. The code area and data area are called sections in the ELF, and depending on the actual need you can use the other standard section, or you can add a custom portion, but an elf executable should have at least one. Text part. Here is our first assembler, with the-T assembly format:
  
Example 1. /t format
  
#hello. S
  
. Data # Segment Declaration
  
msg:. String "Hello, world!//n" # Strings to Output
  
Len =. -Msg # string length
  
. Text # code Snippet Declaration
  
. Global _start # Specify entry function
  
_start: # Displays a string on the screen
  
Movl $len,%edx # parameter three: string length
  
Movl $msg,%ecx # parameter two: the string to display
  
MOVL $,%EBX # parameter one: File descriptor (STDOUT)
  
Movl $4,%eax # system call number (Sys_write)
  
int $0x80 # invoke kernel function
  
# Exit Program
  
MOVL $0,%EBX # parameter one: Exit code
  
MOVL $1,%eax # system call number (Sys_exit)
  
int $0x80 # invoke kernel function

  
For the first time, many programmers think it is too obscure to understand, it is not related, on the Linux platform you can also use the Intel format to write assembler:
  
Example 2. Intel format
  
; Hello.asm
  
section. Data; Data Segment Declaration
  
MSG db "Hello, world!", 0xA; The string to output
  
Len Equ $-msg; String length
  
section. Text; Code Snippet Declaration
  
Global _start; Specify the entry function
  
_start:; Display a string on the screen
  
mov edx, Len; Parameter three: string length
  
mov ecx, msg; Parameter two: the string to display
  
mov ebx, 1; Parameter one: File descriptor (STDOUT)
  
mov eax, 4; System call NUMBER (Sys_write)
  
int 0x80; Calling kernel functions
  
; Exit program
  
mov ebx, 0; Parameter one: Exit code
  
mov eax, 1; System call NUMBER (Sys_exit)
  
int 0x80; Calling kernel functions

  
The syntax for the above two assembler is completely different, but the function is to invoke the sys_write provided by the Linux kernel to display a string, and then call Sys_exit to exit the program. In the Linux kernel source file include/asm-i386/unistd.h, you can find definitions for all system calls.
  
Third, the Linux assembler tool
  
  
  
There are many kinds of assembler tools under the Linux platform, but like Dos/windows, the most basic is the assembler, the connector and the debugger.
  
  
1. Assembler
  
The role of Assembler (assembler) is to convert a source program written in assembly language into a binary form of object code. The standard assembler for the Linux platform is gas, which is a background assembler tool that GCC relies on, usually included in the Binutils software package. Gas uses the standard T-compile syntax, which can be used to assemble programs written in the/T format:
  
  
[Xiaowp@gary code]$ as-o hello.o Hello.s
  

  
  
Another frequently used assembler on the Linux platform is NASM, which provides good macro functionality and can support a considerable number of target code formats, including Bin, A.out, COFF, Elf, RDF, and more. NASM uses a human-written parser, so it executes much faster than gas, and more importantly, it uses the Intel assembler syntax, which can be used to compile assembler programs written in the Intel Syntax format:
  
  
[Xiaowp@gary code]$ nasm-f elf Hello.asm
  

  
  
2. Linker
  
The object code generated by the assembler cannot be run directly on the computer, it must be handled by the linker to generate executable code. The linker is often used to concatenate multiple target code into one executable code, so that the entire program can be developed separately into several modules before they are combined (linked) into an application. Linux uses LD as the standard linker, and it is also included in the Binutils package. Once the assembler has successfully compiled and generated the target code through gas or NASM, it can be linked to an executable program using LD:
  
  
[Xiaowp@gary code]$ ld-s-o Hello hello.o
  

  
  
3. Debugger
  
Some people say that the program is not made up but the tune out, it is evident that debugging in the software development of the important role in the assembly language programming in particular. The debug assembler code under Linux can either use a generic debugger such as GDB, DDD, or ALD (Assembly Language Debugger), which is designed to debug assembly code.
  
  
From a debugging standpoint, the advantage of using gas is that you can include symbol tables in the generated target code so that you can use GDB and DDD for source-level debugging. To include symbol tables in the generated executable program, you can compile and link in the following ways:
  
  
[Xiaowp@gary code]$ as--gstabs-o hello.o Hello.s
  
  
[Xiaowp@gary code]$ ld-o Hello hello.o
  

  
  
When executing the As command, the parameter--gstabs can tell the assembler to add a symbol table to the generated target code, and it is important to note that the-s parameter is not added when linking with the LD command, otherwise the object code

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.