Linux under the at&amp; Getting started with the T syntax (GNU as Assembly Syntax)

Linux under the at& Getting started with the T syntax (GNU as Assembly Syntax)

Last Update:2016-03-30 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Http://blogold.chinaunix.net/u3/105209/showart_2085748.html

Study for such a long time, has been in the C language this level of research and fight, cumulative, a lot of C doubts in books and materials are difficult to find the answer. Programmers are the pursuit of a perfect population, even if there is a little bit of thinking in the mind of a black hole will make it restless. Not long ago in the Itput forum I had "computer Systems A Programmer ' sperspective" (hereinafter referred to as Cs.app) This classic good book, then read through the night for doubts. Although there is no positive answer to some of my doubts, it has shown me a way to "be free"-that is to open the assembly door.

Assembly language is a very close to machine language, its statement and machine instructions between the corresponding relationship is more simple and clear. Open the assembly door not only can relieve the high-level language to give you the doubt, it will make you more understanding of the modern computer operating system, and more importantly, it brings you a feeling of self-confidence, reducing your fear of crumbling in the high place, in response to the Houtie teacher's "do not in the floating sand build high-rise" call. The purpose of learning to assemble now is much different from the previous. As Cs.app said, "the need for programmers to learn to assemble has changed over time, starting with requiring programmers to write programs directly from the Assembly, and now it is required to be able to read and understand the code generated by the optimizer compiler." Can read and understand, this is precisely my needs and goals.

Previous contact with the assembly, mainly Microsoftmasm macro compilation, but at that time the high level of understanding and attitude is not correct, missed a good learning opportunities. Now most of the time is using GCC to work on the UNIX series platform, the selection of assembly language is the GNU assembly, it happens that Cs.app is also used in the GNU Assembly syntax. Since the main purpose of the learning assembly is "to dispel doubts", many forms are compared with C code and assembly code.

1, the assembly lets you see more
As the level of language you use increases, the computer in your eyes will become more blurred, your focus will be more and more away from the language itself and close to the other end of the "problem domain", for example, through Java, you see more of its virtual machine, and not see the real computer; through C, you see just the memory layer In assembly language, you can go deep into the register layer to play freely. The "unique landscape" in the eyes of assembler programmers includes:
A) "Program counter (%EIP)"-a special register that always stores the address of the next instruction to be executed;
b) Integer registers-a total of 8, respectively,%eax,%EBX,%ECX,%edx,%esi,%ebi,%esp, and%EBP, which can store integer data, can save addresses, and can record program status. Each register has its special purpose in the early days, and now because of the "planar addressing [1]" of platforms like Linux, the particularity of the registers is less obvious.
c) Condition Flag Register--holds the state information of the most recently executed arithmetic instruction to implement conditional changes in the control flow.
d) Floating-point registers-as the name implies, are used to store floating-point numbers.
Although the specificity of the register has been weakened, but in fact, each compiler in the use of these registers still follow certain rules, and later.

2, the first glimpse of the Assembly
The following is a simple C function:
void Dummy () {
int a = 1234;
int b = A;
}
We use the GCC plus-s option to convert it to the assembly code as follows (omit some of the content):
Movl $1234, -4 (%EBP)
Movl-4 (%EBP),%eax
Movl%eax, -8 (%EBP)
Looked at one eye again, still can not understand, just found some familiar content, because above mentioned such as%EBP,%eax and so on. This is only a primer, let us have a perceptual understanding of the compendium of "looks." We came to see it a little. Look at the assembly code looks very similar, yes, assembly code is a set of "instruction + operand" statement. Assembly instructions are fixed and each instruction has its own fixed purpose, while operands represent multiple types.

1) Operand representation
Most assembly instructions have one or more operands, including the source and purpose in the instruction operation. A standard instruction format is roughly the following: "instruction + source operand + destination operand", where the source operand can be an immediate number, a number read from a register, or a number read from memory, while the destination operand can be a register or memory. According to this classification, there are roughly three operands of the operand:
A) The immediate number notation, such as "$1234" in "Movl $1234,-4 (%EBP)", is an immediate number as the operand, according to the GNU Assembler syntax, the immediate number is expressed as "$+ integer". The immediate number is commonly used to represent some constants in the code, such as "$1234" in the previous example. Note that the immediate number cannot be used as the destination operand.
b) Register notation--This is relatively simple, it is the content of the representation register. As in the above "movl-4 (%EBP),%eax" in the%eax is the use of register notation as the source operand, and "Movl%eax, -8 (%EBP)"%eax is the use of register notation for the purpose of the operand.
c) Memory reference notation-the computed value of this operand represents the corresponding memory address. The assembly instruction accesses the corresponding memory location according to the memory address. As in the previous example "Movl-4 (%EBP),%eax", " -4 (%EBP)", which represents the memory address is (%EBP register Content-4) to get the value.

2) Data Transfer instructions
The most commonly used instruction in assembly language-the data transmission instruction-is also the first kind of assembly instruction that we contact. The format of its instruction is: "mov source operand, the purpose operand".
The MOV series supports access and transmission from the smallest one byte to the maximum double word. Where MOVB is used to transmit a byte of information, MOVW is used to transmit two bytes, that is, a word of information, MOVL used to transmit double word information. These are unknown. Besides, the MOV series also offers two MOVSBL and movzbl with bit extensions.

==============================================================

As an efficient and tightly integrated hardware platform programming language, assembler has a very important role in the operating system, embedded development and other fields. Because the assembly relies on the hardware architecture (CPU script), the assembly language on the different architectures is also very varied. In this paper, we briefly introduce the T/T syntax under Linux (that is, the GNU as Assembler Syntax) and the basic method of compiling Linux.

At/t, the grammar originated at the T-Bell Laboratory, which was formed on the basis of the processor opcode syntax used to implement UNIX systems, and the main differences between the at and T Grammar and Intel syntax are as follows:
AT/T uses $ for immediate numbers, Intel does not, so it means decimal 2 o'clock, at/or $, and Intel is 2
AT/t before register plus%, for example EAX register is represented as%eax
The order in which the operands are processed is the opposite of Intel, for example, Movl%eax,%EBX is passing the value in EAX to EBX, and Intel is such a mov ebx, eax
At the end of the mnemonic, a single character is appended to indicate the length of the data in the operation, such as Movl $foo,%eax equivalent to the Intel mov eax, Word ptr foo
Long jump and call format is different, at/T is ljmp $section, $offset, and Intel is the JMP Section:offset
The main difference is these, the other details are many, the following gives a specific example to illustrate

#cpuid. S Sample Program

. Section. Data

Output
. ASCII "The processor Vendor ID is ' xxxxxxxxxxxx ' \ n"

. section. Text
. globl _start

_start:

MOVL,%eax

Cpuid

MOVL $output,%edi

Movl%ebx, (%edi)

Movl%edx, (%edi)

Movl%ecx, (%edi)

MOVL $4,%eax

MOVL $,%EBX

MOVL $output,%ECX

MOVL $42,%edx

int $0x80

MOVL $,%eax

MOVL,%EBX

int $0x80

The function of this program is to query the CPU's vendor ID, where:

, the ASCII definition string (which is completely different from the Intel format). The section is the statement of the declaration segment, the. Data and. Text are the segment names, respectively, the segments and code snippets, and _start is the default entry label for gas (GNU assembler), which indicates that the program starts from here. Globl declares _start as a label that is accessed by external programs. The CPUID requests the specified information for the CPU, which uses EAX as input, and ebx,edx,ecx as the output, where 0 is used as the input instruction for CPUID, requesting the return of the CPU's vendor ID string. The returned result, a 12-byte string, is stored in three registers, of which EBX holds low 4 bits, edx Middle 4 bits, ecx high 4 bits (note order!). ）。 Next, you define a pointer edi,edi to the start address of the output, and then the 3 statements in the output replace the X with the vendor information. 28 in%edi indicates the offset, that is, the address of the entire address is%edi with 28 bytes, which is exactly the address of the first X in output. Then the result is printed, here is a system call to Linux (int0x80), the system calls the parameters are: EAX system call number, EBX to write the file descriptor, ecx the first address of the string, edx string length, the value of these parameters in the program is 4, 1 (standard output), address of output, and 42. Finally call the number 1th system call-Exit function, return to the shell, this time the value in EBX is returned to the shell exit code, 0 means no exception

Then assemble the connection to run the program:
[Email protected] src]# As-o CPUID.O Cpuid.s
[Email protected] src]# ld Cpuid.o-o CPUID
[Email protected] src]#./cpuid
The processor Vendor ID is ' Genuineintel '
[Email protected] src]#

My computer is Pentium M CPU, so the result returned is Genuineintel.

A few notes:

1) Linux Standard assembly environment for As,ld,gdb,gprof,objdump and other GNU development debugging tools, in addition to GDB, all the others with the Binutils package release. Where as is used is the/t syntax. NASM can also be used under Linux for assembler programming in the Intel format

2) Linux under the assembly of the system call int 0x80, and DOS under the INT 21h is similar, but the transfer parameters are different

3) Segment declaration statement. Section does not need to add the end-of-segment flag (Segment/ends) as in the Intel format, and the beginning of the next segment automatically marks the end of the previous paragraph

4) The entry tag of the simple program is not to be defined, and LD will judge the entry itself, but will give a warning

=========================================== Example 2

Example 2. The assembler that asks for the maximum value of a set of numbers

#PURPOSE: This program finds the maximum number of a
# Set of data items.
#
#VARIABLES: The registers has the following uses:
#
#%edi-holds The index of the data item being examined
#%ebx-largest Data item found
#%eax-current Data item
#
# The following memory locations is used:
#
# Data_items-contains the item data. A 0 is used
# to terminate the data
#
. Section. Data
Data_items: #These is the data items
. Long 3,67,34,222,45,75,54,34,44,33,22,11,66,0

. section. Text
. globl _start
_start:
MOVL $,%edi # move 0 into the index register
Movl Data_items (,%edi,4),%eax # Load the first byte of data
Movl%eax,%EBX # Since the first item,%EAX is
# The biggest

Start_loop: # Start loop
Cmpl $,%eax # Check to see if we ' ve hit the end
Je loop_exit
Incl%edi # Load Next value
Movl Data_items (,%edi,4),%eax
Cmpl%ebx,%eax # Compare values
Jle Start_loop # jump to loop beginning if the new
# one isn ' t bigger
Movl%eax,%EBX # Move the value as the largest
JMP Start_loop # jump to loop beginning

Loop_exit:
#%EBX is the status code for the exit system call
# and it already has the maximum number
MOVL $,%eax #1 is the exit () Syscall
int $0x80

Assemble, link, execute:

$ as Max.s-o MAX.O
$ ld Max.o-o Max
$./max
$ echo $?

This program finds a maximum number in a set of numbers and takes it as the exit state of the program. This set of numbers is given in the. Data section:

Data_items:
. Long 3,67,34,222,45,75,54,34,44,33,22,11,66,0

The. Long indicates that a set of numbers, each of 32 bits, corresponds to an array in the C language. This array begins with a label data_items, and the assembler takes the first address of the array as the address represented by the Data_items symbol, data_items similar to the array name in the C language. Data_items This label is not used. GLOBL declaration, because it is used only within this assembler, the linker does not need to know the existence of this name. In addition to. Long, common data declarations include:

. Byte, also declares a group of numbers, each of which accounts for 8 bits
. ASCII, for example,. ASCII "Hello World", which declares 11 numbers and values the ASCII code for the corresponding character. Note that, unlike the C language, this declares a string at the end of which there are no ' s ' characters, and can be declared as. ASCII "Hello world\0" if required to end with '/'.

The last number of the Data_items array is 0, we compare each number in one loop, and the loop terminates when we hit 0. In this loop:

The EDI register holds the current position in the array, adding 1 to the value of the EDI each time a number is compared, pointing to the next number in the array.
The EBX register holds the maximum value found so far and updates the value of EBX if a larger number is found.
The EAX register holds the number currently being compared, and each time EDI is updated, the next number is read into the EAX.

_start:
MOVL,%edi

Initializes an EDI, pointing to the No. 0 element of the array.

Movl Data_items (,%edi,4),%eax

This instruction transmits the No. 0 element of the array to the EAX register. Data_items is the first address of the array, the value of EDI is the subscript of the array, 4 means that each element of the array is 4 bytes, then the address of the EDI element in the array should be Data_items + EDI * 4, read the data from this address, write the instruction is the above, The representation of this address is explained in more detail in the next section.

MOVL%eax,%EBX

The initial value of a ebx is also the No. 0 element of the array. Below we enter a loop, denoted by a label start_loop at the beginning of the loop, followed by the end of the loop with the label Loop_exit.

Start_loop:
Cmpl,%eax
Je loop_exit

Compare the value of EAX is not 0, if it is 0 means to reach the end of the array, it is necessary to jump out of the loop. The cmpl instruction subtracts two operands, but the result of the calculation is not saved, but the flag bit in the EFlags register is changed based on the result of the calculation. If the two operands are equal, the result is the ZF position 1 in 0,eflags. Je is a conditional jump instruction, it checks the eflags in the ZF bit, the ZF bit is 1 jump, the ZF bit is 0 does not jump, continue to execute the next instruction. The visible conditional jump instruction and the comparison instruction are used together, the former change the flag bit, the latter according to the mark bit to make the judgment, if participates in the comparison two number equal then jumps, JE's e is the expression equal.

Incl%edi
Movl Data_items (,%edi,4),%eax

The value of EDI is added 1, and the next number in the array is transferred to the EAX register.

Cmpl%EBX,%eax
Jle Start_loop

Compares the current array element eax to the maximum value found so far ebx, if the former is less than or equal to the latter, the maximum does not change, jump to the beginning of the loop to compare the next number, or continue to execute the next instruction. Jle is also a conditional jump instruction, le means less than or equal.

MOVL%eax,%EBX
JMP Start_loop

The maximum value is updated ebx then jumps to the beginning of the loop to compare the next number. JMP is an unconditional jump instruction, and what conditions are not judged, jump directly. The instruction behind the Loop_exit designator exits the program with the exit system call.

Read (597) | Comments (0) | Forwards (2) |0

Previous: T assembly instruction Summary

Next: Learn the biggest benefits of Linux

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Linux under the at& Getting started with the T syntax (GNU as Assembly Syntax)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Linux under the at&amp; Getting started with the T syntax (GNU as Assembly Syntax)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Linux under the at& Getting started with the T syntax (GNU as Assembly Syntax)