Introduction to the assembly in Linux

Source: Internet
Author: User
Tags volatile

GNU as compilation syntax

The GNU assembler syntax uses the AT/T assembly and the syntax of Intel Assembler is mainly the following differences:

    • The immediate operand in the/T assembly is preceded by a ' $ ', the register operand is preceded by a percent sign '% ', the absolute jump operand is preceded by a ' * ', and the Intel syntax does not contain these symbols;
    • AT/t syntax is the opposite of the source operand and the purpose operand sequence used in the Intel syntax, and the source operand and the destination operand are left to right, and the Intel syntax is from right to left, such as add eax, 4 at/t syntax is Addl $4,% eax
    • AT/t syntax the length of the memory operand is determined by the last bit of the opcode, the opcode suffix has b/w/l respectively for the memory reference width of 8 bits, 16 bits and 32 bits, and the Intel syntax by using prefix byte Ptr,word ptr and DWORD before the memory operand PTR for this purpose. For example, Mov al in Intel, byte ptr foo corresponds to the statement in Movb $foo,%al;
    • AT/t syntax immediately jump and call for Ljmp/lcall $section, $offset while Intel is in Jmp/call far section:offset,at&t return instruction Lret $ Stack-adjust corresponds to Intel's ret far stack-adjust;
    • The/T assembler does not support multi-snippet programs, and UNIX-like systems require all code to be in one segment;

preprocessing : As assembler can be a simple preprocessing of assembly language, including the deletion of extra space and tabs, delete the comment statement, the character constant converted to the corresponding value, but this preprocessing can not handle macros, and do not handle the function of including files, if you need these features, It can be implemented using the GCC preprocessor cpp.

Symbols: The symbols in the GNU assembly language are identifiers that consist of characters, the characters that make up the symbols contain uppercase and lowercase letters, numbers and _.$ three characters, and the symbols are not allowed to start with a number, and are case-sensitive. There is no limit to the length of the symbol in the assembler, and the symbol uses spaces to define the start and end. The statement ends with a newline character or a line break (;), and the file must end with a newline character, with 0 or more labels, followed by a key symbol that determines the type of statement followed by a colon following the symbol, which determines the semantics of the remainder of the statement. If the key symbol starts with a., then the current statement is an assembly command, and if the key symbol starts with a letter, then the current statement is an assembly language instruction statement, and the general format of an IT OH statement is:

Label: Assembly command     Comment statement or label: instruction mnemonic operand 1, operand 2     comment statement

Constants : Constants are a number, divided into character constants and numeric constants, character constants are divided into single characters and strings, and numeric constants are divided into integers and floating-point numbers. A single character in assembly language indicates that one version is preceded by a single quotation mark, and the string needs to be double-quoted.

directive : instruction is the operation of the CPU, usually the instruction is called the operation code, the operand is the object of instruction operation, and the address is the position of instruction data in memory, an instruction statement version includes: designator, opcode (instruction mnemonic), operand, annotation.

operand: The operand contains an immediate operand, a register, and an amount of memory. The indirect operand contains the address value of the actual operand, and the/t syntax represents the indirect operand by adding * to the operand, and only the jump and invoke instructions can use the indirect operand. Immediately before the operand is preceded by a $ sign, the register name is preceded by a% sign, the memory operand is specified by the variable name or register containing the variable address, and the variable name implicitly indicates the address of the variable and indicates that the CPU references the contents of the address memory.

opcode: at/t syntax the last character of the instruction opcode is used to indicate the width of the operand, and the characters B,w and L respectively make the operands of the Byte,word and long types. If the instruction name does not have a character suffix and the specified statement does not contain a memory operand, the as assembler attempts to determine the width of the operand based on the number of register operands. OpCode prefixes are used to decorate subsequent opcode, which are used for repeating string instructions, providing area overrides, performing a bus lock operation, or specifying the operand and address width.

Memory Reference : The indirect memory reference form of the-T syntax is section:disp (base, index, scale), where base and index are 32-bit base registers and index registers, disp is an optional offset value, Scale is the proportional factor, and the scales multiply by index to represent the operand address, and section is the segment register specified by the memory operand.

movl var,%eax                 #把内存地址var处的内容放入寄存器%eaxmovl%cs:var,%eax             #把代码段中的内存地址var处的内容放入%eaxmovb $0xa0,%es:(%EBX)         #把0xa0放入到es段的%EBX at the specified offset movl $var,%eax                #把var的地址放入%eaxmovl Array (%esi), %eax         4),%eax     4),%eax# put the contents of the Array+ebx+esi*4 address into EAXMOVL- 4 (%EBP),%eax            #把ebp-4 address at the EAX

instruction Jump : The jump instruction is used to transfer the execution point to another location of the program, and the destination of the jump is usually represented by a label. JMP is unconditional jump, can be divided into direct jump and indirect jump, the direct jump statement is the notation of the jump target, the indirect jump statement is the use of * as the operator indicator prefix character.

segment : A segment is used to represent an address range, which is used primarily to represent different information areas in the target file that the compiler generates. Examples include code snippets, data segments, BSS segments.

Symbols: Symbols have many functions that can be used to name objects, and connectors perform link operations through symbols, and the debugger uses symbols for debugging. The label is a sign followed by a colon, which is used to represent the current location of the code. Special symbols. Used to indicate the current position of the Assembly. Each symbol has a value and type attribute, except for the name. The value of the symbol is typically 32-bit, and the linker does special handling of undefined symbol values, and if the undefined symbol value is 0 indicates that the symbol is not defined in this assembler, the linker attempts to use the other linked file to determine the value of the symbol.

As Assembler

the command line format for the as assembler is: as [option] [-O objfile] [srcfile.s], if no output file name is specified, the a.out file is output by default. You can give 0 or more input filenames on the As command line, as the contents of these files are read from left to right, and the parameters on the command line will be treated as input filenames if there is no real meaning, and if there is no file name on the command line, then as will attempt to read the input file contents from the terminal.

As output file: The generated binary is compiled by the input assembly file, and the destination file ends up as the input file of the connector ld. The target file contains assembly code to assist LD in generating the information for the executable file, as well as symbolic information for debugging.

inline assembly

The basic format for inline assembly is:

ASM (" Assembler statement "      : Output register:      input register      : Register to be modified)

In addition to the assembly statements are necessary, others can be omitted if not used. Where ASM is the key word of the inline assembly, the assembler statement is used to write assembly instructions, and the output register indicates which registers are used to hold the output data after the assembly statement is executed, and the input registers indicate the input values that should be placed in some registers when the code is started. Registers that are modified represent modifications to the values in the listed registers.

Common register Load Codes

Td>j /tr>
Code Description code Description
a use register eax m use memory address
b use register ebx o use memory address and can offset value
C use register ecx I Use constant 0~31
d use register edx Using constants 0~63
S using esi K using constants 0~255
D use EDI L Use constants 0~65535
q use dynamically allocate byte addressable registers M Using constants 0~3
r using any dynamic allocation register N using 1-byte constants
g Use common valid addresses O Use constants 0~31
A use EAX and edx union = output operand, output value Replace previous value
+ indicates operand readable writable & content is modified before the operand is used

When executing code, if you do not want the assembly statement to be optimized by GCC, you need to add the keyword volatile after the ASM symbol. The keyword volatile can also be placed in the function name to modify the function to notify GCC that the function does not return.

Introduction to the assembly in Linux

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.