AS86 Assembler
1. Source and use for Linux
AS86 source minix-386 developed by the Intel 8086, 80386 Assembler compiler and linker, he mainly for Linux to create 16-bit boot boot sector program BOOT/BOOTSECT.S and Real mode initial Setup program boot/ The binary execution code of the SETUP.S.
2. Grammar
AS86 syntax is an assembly language syntax based on the MINIX system, incompatible with the syntax of the GNU as Assembler
Assembly Command Basic format: As [option]-O objfile srcfile
3. Statements
Assembly Language Program Srcfile is a text file, which is composed of a series of statements at the end of a newline character, including an assignment statement, a pseudo-operator statement, a machine instruction statement;
An assignment statement is used to assign a value to a symbol or identifier;
A pseudo-operator statement is an indicator used by the assembler, and it usually does not produce any code, it has a pseudo-operator and 0 or more operands. Each opcode is made up of a. Start, dot character "." itself is a special symbol, which represents the location counter during compilation, whose value is the address of the first byte of the machine instruction where the point symbol is located;
Machine instruction statement: a mnemonic specified by an executable machine, consisting of an opcode, 0 or more operands. Any other statement can have a label before it, its format [Label:] instruction Mnemonic [operand] ...
4. Target file
The target file generated by assembler compilation usually contains at least three segments or sections: Body segment (. text), data segment (. data), uninitialized data segment (. BSS)
Body segment: Code and read-only data
Data segment: Readable writable data
Uninitialized segment: Uninitialized data (no space in the destination file), initialized to 0 when loaded
5. Compiling and linking of AS86 assembly language Program
How to use AS86 and options
AS86 [ -03agjuw] [-b [bin]] [-LM [list]] [-N name] [-O objfile] [-s sym] infile
Default settings (except for the following default values, the other options are off by default or none)
-3 using 80386 of 32-bit output
List is displayed on standard output
The base name of the name source file
Meaning of each option
0 using 16-bit code Snippets
3 using 32-bit code Snippets
-A partial compatibility option with GNU AS and LD enabled
-B generates a binary file, followed by a file name
-G only stores the global symbol in the destination file
-j use all jump statements as long jump statements
-L generates the list file, followed by the list filename
-M extending the macro definition in the list
-N follows the module name, replacing the source file name into the destination file
-O produces the target file, followed by the target filename
-S produces symbol file followed by symbol filename
-U symbols that do not define symbols as input for unspecified segments
-W does not display alarm information
Ld86 use syntax and options for the linker
LD [ -03mimrs[-]] [-t textaddr] [-llib_extension] [-o outfile] infile
Default options
-3 32-bit output
outfile a.out Format output
Meaning of each option
-0 produces head structure with 16-bit magic number and uses i86 subdirectory for-LX option
-3 generate head structure with 32-bit magic number and use I386 directory for-LX option
-m display linked symbols on standard output devices
-T follows body Base address
-I separated instruction and data segment I&d output
-LX adding a library/local/lib/subdir/libx.a to the linked file list
-m show linked modules on standard output devices
-o Specifies the output file name followed by the output file name
-R produces output suitable for further relocation
-S Delete all symbols in the destination file
GNU as Assembly
The AS86 assembler is only used to compile the BOOT/BOOTSECT.S boot sector program in the kernel and the Setup program BOOT/SETUP.S in real mode. All the other assembly language programs in the kernel (including the assembler generated by C) are compiled using gas and are linked to the modules generated by C language compilation.
1, compile as assembly language program
Compile format:
As [option] [-O objfile] [srcfile.s ...]
where objfile default bit a.out, all compilation options can be placed at will, but the file name placement Order compilation results are closely related
The source of a program can be placed in one or more files, how the program's source code is divided into several files and does not change the meaning of the program. The source code of the program is the result of all these files in order, each compilation compiles only one source program, but a source program can consist of multiple text files.
As the output file is the input assembler generated by the binary file, that is, the target file
The target file is used primarily as the input file for the connector LD, which contains the program code that has been compiled, the information that assists the LD in generating the executable file, and may also contain debugging information
2. As Assembly syntax
Assembler preprocessing: The as assembler has a simple preprocessing function that removes extra spaces, blank lines, and annotations, but does not process macros and include files
Symbols: Symbols are identifiers consisting of characters, valid characters that make up symbols are taken from uppercase and lowercase characters, numbers, and three characters ' _.$ ', symbols do not allow numbers to begin with uppercase and lowercase characters, symbols use other characters (such as spaces, carriage returns, line breaks) or the beginning of a file to define the beginning and end of a symbol
Statement with line break or line delimiter '; ' To end, the final statement of the file must end with a newline character, and if you use the backslash characters ' \ ' at the end of a line, you can make the statement use a line more. The statement starts with 0 or more labels, followed by a key symbol that determines the type of statement, which consists of a symbol followed by a ': '. The key symbol is used to determine the semantics of the remainder of the statement. If the key symbol is '. ' Start, then the current statement is a assembly instruction, and if the key symbol starts with a letter, then the current statement is an assembly language instruction statement. Its General format:
Designator: Instruction mnemonic operand 1, operand 2 comment section
Constants: is a number, can be divided into character constants and numeric constants two classes, character constants can also be divided into strings and a single character, and the numeric constants are divided into integers, large numbers and floating point numbers. The string must be enclosed in double quotation marks, and the backslash will not work if it is followed by a backslash, and the as assembler will issue a warning message. A single character needs to be added before the character, such as "a" to indicate 65
3. Instruction statements, operands and addressing
An instruction is the action that the CPU performs, and the instruction statement is an instruction that executes at the time the program runs.
For statements with two operands, the first is the source operand and the second is the destination operand, that is, the result of the operation is saved in the second operand
The operand can be an immediate number, register (value in the CPU register) or memory (value in memory); an indirect operand is an address value that contains the actual operation value. AT/t syntax with ' * ' to specify an indirect operand, only the jump/Invoke instruction can use the indirect operand; immediately before the $ prefix, the register preceded by the% prefix, the memory operand is specified by the variable name or a register containing the variable address
Naming of instruction OpCode
prefix of instruction opcode
Memory reference
Jump Instructions
Area and relocation
A zone (section) is used to represent an address range, and the operating system treats and processes data information in that range in the same way.
The concept of a zone is primarily used to denote different areas of information in the target file (or executable program) generated by the compiler, such as the body area and data area in the destination file.
The linker LD reorganizes the intermediate file in the middle of the link process, which is also known as the relocation operation
As assembly output of the target file has at least 3 extents, respectively, the body, data, BSS area, each area may be empty, in a target file, its text area starting at address 0, followed by the data area, followed by the BSS area
When a zone is relocated, in order for the linker LD to know that the data will change and how to modify the data, the as assembler will also write the required relocation information to the target file, in order to perform the relocation operation, each time an address in the destination file is involved, the LD must know:
Where is the reference to an address in the destination file calculated from?
What is the byte length of the reference?
Which area is referenced by this address? (address)-What is the value of (the start address of the zone) equal to?
is the reference to the address related to the program counter PC?
All addresses used by as can be represented as: (zone) + (offset in the zone), as most expressions in the as calculation have this zone-related attribute, and {secname n} is used to denote offset N in the Secname area, except for the text, data, and BSS regions. There is also an absolute address area (Abusolute area, when the linker combines the target files together, the addresses in the absolute area are always the same. Causes the absolute area to overlap, and the data, text, BSS area must not be overwritten; there is also a zone called undefined (Undefined zone), which cannot be determined at assembly time that any address in the region is set to {Undefined u}, where U will be filled in later. Because values are always meaningful, the only way to have undefined addresses is to involve undefined symbols only. A reference to a common block (common block) is one such symbol: its value is unknown at assembly time, so it is in the undefined area
4, LD linker working process such as
Sub-area
BSS area
Symbol
Symbols are an important concept, and programmers use symbols to name objects, and the linker uses symbols for linking operations, and the debugger uses symbols for debugging.
1, Special point symbol
Special symbols. Represents the current address of the as assembly, so the expression "Mylab:." Long. " Rather than defining a Mylab variable, the value of the variable is specified to contain its own address value. To "." The assignment is the function of the ". org" command.
2. Symbol Properties
In addition to the name, each symbol has a property of value and type, and if you do not define it using a symbol, as will assume that all properties are 0, indicating that the symbol is an externally defined symbol
The value of the symbol is usually 32-bit, according to the difference between absolute area and text, data, BSS, the symbol is also divided into relative symbol, and absolute symbol
The value of LD for an undefined symbol is 0, which means that the symbol is not defined in this assembler, and the LD attempts to determine its value based on the other linked file, and if the undefined symbol is not 0, Then the symbolic value represents the length of the public storage space that the Comm Public declaration needs to retain, and the symbol points to the first address of the storage space.
As assembly instructions
Assembly instructions are pseudo-instructions for how the assembler operates. Assembly commands are used to require the assembler to allocate space for a variable, determine the program start address, specify the current assembly's area, modify the position counter value, and so on. The names of all assembly commands are in ". "At the beginning, the rest are characters and are case-insensitive, (but usually lowercase).
1. Align Abs-expr1, abs-expr2, ABS-EXPR3
. Align is a storage alignment assembly command that sets (increments) the position counter in the current sub-region to the next specified storage boundary. The first parameter for the a.out format indicates that the next boundary is specified by 2n, for the elf format file, the next boundary is specified by N, the second parameter represents the byte value to be padded for alignment, the omitted is 0, and the third parameter represents the maximum number of bytes that the alignment operation allows to skip.
2,. ASCII "string" ...
Allocates space for a string from the current position indicated by the position counter and stores the string. You can write multiple strings separately using commas. The assembly command causes the as to assemble the strings at successive address locations, and does not automatically add 0 (NULL) bytes after each string.
3.. asciz "String" ...
This assembly command is similar to ". ASCII", but "NULL" characters are added automatically after each string
4,. Byte expressions
The assembly command defines 0 or more byte values separated by commas, and the value of each expression is a byte
5. Comm Sysmbol, length
Declare a common area in BSS, in which a common symbol in a target file is merged with the common symbols of other target files; If LD does not find a definition of a symbol, but only one or more public symbols, then LD assigns uninitialized memory of length bytes specified. Length must be an absolute value expression, and if LD finds multiple common symbols of different lengths but with the same name, the LD allocates the largest space
6. Data subsection
The assembly command informs as to assemble the subsequent statements into the data sub-area of the number subsection, if the ellipsis is 0
7. Desc symbol, ABS-EXPR
To set the value of an absolute expression to the Description field N_DESCD 16-bit value of the symbolic symbol
8. Fill repeat, size, value
The assembly command produces a number of duplicate copies (repeat) of size bytes, if size is greater than 8, the maximum is 8, each repeating byte is taken from a 8-byte number, the height 4 bytes is 0, and the low 4 bytes is value
9. Global symbol
So that the assembler LD can see the symbolic symbol, if we define the symbolic symbol in the target file, then its value will be able to be used by other target files in the link process, if the target file does not define the symbol, then its properties will be in the link process other target files. (This is implemented by setting the N_ext field for the symbol symbols)
10,. int expressions
Set 0 or more integer values in a zone, and each comma-separated value is the run-time value
11.. lcomm symbol, length
The local common areas specified for symbol symbols retain space of length bytes,
12,. Long expressions
With. int
13,. Octa Bignums
Specifies a 16-byte large number (. Byte. Word. Quad. OCAT corresponds to 1 2 4 8 16 bytes respectively)
14. org NEW_LC, fill
The current position counter is set to NEW_LC (not spanned, note that the position counter is region-based), fill indicates a value that crosses bytes, and if omitted, fill 0
15,. Quad Bignums
16. Short expressions (with. Word expressions)
17. Space size, fill
Produces a size byte, 0 per byte padding fill,fill omitted
18. String "string"
Defines one or more comma-separated strings that can be used in a string with escape characters and automatically null after each string
19. Text subsection
19. Word Expressions
Command options for AS Assembly
-A Open program list
-F Quick Action
-o Specifies the output destination file name
-R combined Data area and code area
-W Cancel warning message
Linux kernel fully annotated programming language and environment (i)