Linux kernel fully annotated programming language and environment (i)

Source: Internet
Author: User

AS86 Assembler

1. Source and use for Linux

AS86 source minix-386 developed by the Intel 8086, 80386 Assembler compiler and linker, he mainly for Linux to create 16-bit boot boot sector program BOOT/BOOTSECT.S and Real mode initial Setup program boot/ The binary execution code of the SETUP.S.

2. Grammar

AS86 syntax is an assembly language syntax based on the MINIX system, incompatible with the syntax of the GNU as Assembler

Assembly Command Basic format: As [option]-O objfile srcfile

3. Statements

Assembly Language Program Srcfile is a text file, which is composed of a series of statements at the end of a newline character, including an assignment statement, a pseudo-operator statement, a machine instruction statement;

An assignment statement is used to assign a value to a symbol or identifier;

A pseudo-operator statement is an indicator used by the assembler, and it usually does not produce any code, it has a pseudo-operator and 0 or more operands. Each opcode is made up of a. Start, dot character "." itself is a special symbol, which represents the location counter during compilation, whose value is the address of the first byte of the machine instruction where the point symbol is located;

Machine instruction statement: a mnemonic specified by an executable machine, consisting of an opcode, 0 or more operands. Any other statement can have a label before it, its format [Label:] instruction Mnemonic [operand] ...

4. Target file

The target file generated by assembler compilation usually contains at least three segments or sections: Body segment (. text), data segment (. data), uninitialized data segment (. BSS)

Body segment: Code and read-only data

Data segment: Readable writable data

Uninitialized segment: Uninitialized data (no space in the destination file), initialized to 0 when loaded

5. Compiling and linking of AS86 assembly language Program

How to use AS86 and options

AS86 [ -03agjuw] [-b [bin]] [-LM [list]] [-N name] [-O objfile] [-s sym] infile

Default settings (except for the following default values, the other options are off by default or none)

-3 using 80386 of 32-bit output

List is displayed on standard output

The base name of the name source file

Meaning of each option

0 using 16-bit code Snippets

3 using 32-bit code Snippets

-A partial compatibility option with GNU AS and LD enabled

-B generates a binary file, followed by a file name

-G only stores the global symbol in the destination file

-j use all jump statements as long jump statements

-L generates the list file, followed by the list filename

-M extending the macro definition in the list

-N follows the module name, replacing the source file name into the destination file

-O produces the target file, followed by the target filename

-S produces symbol file followed by symbol filename

-U symbols that do not define symbols as input for unspecified segments

-W does not display alarm information

Ld86 use syntax and options for the linker

LD [ -03mimrs[-]] [-t textaddr] [-llib_extension] [-o outfile] infile

Default options

-3 32-bit output

outfile a.out Format output

Meaning of each option

-0 produces head structure with 16-bit magic number and uses i86 subdirectory for-LX option

-3 generate head structure with 32-bit magic number and use I386 directory for-LX option

-m display linked symbols on standard output devices

-T follows body Base address

-I separated instruction and data segment I&d output

-LX adding a library/local/lib/subdir/libx.a to the linked file list

-m show linked modules on standard output devices

-o Specifies the output file name followed by the output file name

-R produces output suitable for further relocation

-S Delete all symbols in the destination file

GNU as Assembly

The AS86 assembler is only used to compile the BOOT/BOOTSECT.S boot sector program in the kernel and the Setup program BOOT/SETUP.S in real mode. All the other assembly language programs in the kernel (including the assembler generated by C) are compiled using gas and are linked to the modules generated by C language compilation.

1, compile as assembly language program

Compile format:

As [option] [-O objfile] [srcfile.s ...]

where objfile default bit a.out, all compilation options can be placed at will, but the file name placement Order compilation results are closely related

The source of a program can be placed in one or more files, how the program's source code is divided into several files and does not change the meaning of the program. The source code of the program is the result of all these files in order, each compilation compiles only one source program, but a source program can consist of multiple text files.

As the output file is the input assembler generated by the binary file, that is, the target file

The target file is used primarily as the input file for the connector LD, which contains the program code that has been compiled, the information that assists the LD in generating the executable file, and may also contain debugging information

2. As Assembly syntax

Assembler preprocessing: The as assembler has a simple preprocessing function that removes extra spaces, blank lines, and annotations, but does not process macros and include files

Symbols: Symbols are identifiers consisting of characters, valid characters that make up symbols are taken from uppercase and lowercase characters, numbers, and three characters ' _.$ ', symbols do not allow numbers to begin with uppercase and lowercase characters, symbols use other characters (such as spaces, carriage returns, line breaks) or the beginning of a file to define the beginning and end of a symbol

Statement with line break or line delimiter '; ' To end, the final statement of the file must end with a newline character, and if you use the backslash characters ' \ ' at the end of a line, you can make the statement use a line more. The statement starts with 0 or more labels, followed by a key symbol that determines the type of statement, which consists of a symbol followed by a ': '. The key symbol is used to determine the semantics of the remainder of the statement. If the key symbol is '. ' Start, then the current statement is a assembly instruction, and if the key symbol starts with a letter, then the current statement is an assembly language instruction statement. Its General format:

Designator: Instruction mnemonic operand 1, operand 2 comment section

Constants: is a number, can be divided into character constants and numeric constants two classes, character constants can also be divided into strings and a single character, and the numeric constants are divided into integers, large numbers and floating point numbers. The string must be enclosed in double quotation marks, and the backslash will not work if it is followed by a backslash, and the as assembler will issue a warning message. A single character needs to be added before the character, such as "a" to indicate 65

3. Instruction statements, operands and addressing

An instruction is the action that the CPU performs, and the instruction statement is an instruction that executes at the time the program runs.

For statements with two operands, the first is the source operand and the second is the destination operand, that is, the result of the operation is saved in the second operand

The operand can be an immediate number, register (value in the CPU register) or memory (value in memory); an indirect operand is an address value that contains the actual operation value. AT/t syntax with ' * ' to specify an indirect operand, only the jump/Invoke instruction can use the indirect operand; immediately before the $ prefix, the register preceded by the% prefix, the memory operand is specified by the variable name or a register containing the variable address

Naming of instruction OpCode

prefix of instruction opcode

Memory reference

Jump Instructions

Area and relocation

A zone (section) is used to represent an address range, and the operating system treats and processes data information in that range in the same way.

The concept of a zone is primarily used to denote different areas of information in the target file (or executable program) generated by the compiler, such as the body area and data area in the destination file.

The linker LD reorganizes the intermediate file in the middle of the link process, which is also known as the relocation operation

As assembly output of the target file has at least 3 extents, respectively, the body, data, BSS area, each area may be empty, in a target file, its text area starting at address 0, followed by the data area, followed by the BSS area

When a zone is relocated, in order for the linker LD to know that the data will change and how to modify the data, the as assembler will also write the required relocation information to the target file, in order to perform the relocation operation, each time an address in the destination file is involved, the LD must know:

Where is the reference to an address in the destination file calculated from?

What is the byte length of the reference?

Which area is referenced by this address? (address)-What is the value of (the start address of the zone) equal to?

is the reference to the address related to the program counter PC?

All addresses used by as can be represented as: (zone) + (offset in the zone), as most expressions in the as calculation have this zone-related attribute, and {secname n} is used to denote offset N in the Secname area, except for the text, data, and BSS regions. There is also an absolute address area (Abusolute area, when the linker combines the target files together, the addresses in the absolute area are always the same. Causes the absolute area to overlap, and the data, text, BSS area must not be overwritten; there is also a zone called undefined (Undefined zone), which cannot be determined at assembly time that any address in the region is set to {Undefined u}, where U will be filled in later. Because values are always meaningful, the only way to have undefined addresses is to involve undefined symbols only. A reference to a common block (common block) is one such symbol: its value is unknown at assembly time, so it is in the undefined area

4, LD linker working process such as

Sub-area

BSS area

Symbol

Symbols are an important concept, and programmers use symbols to name objects, and the linker uses symbols for linking operations, and the debugger uses symbols for debugging.

1, Special point symbol

Special symbols. Represents the current address of the as assembly, so the expression "Mylab:." Long. " Rather than defining a Mylab variable, the value of the variable is specified to contain its own address value. To "." The assignment is the function of the ". org" command.

2. Symbol Properties

In addition to the name, each symbol has a property of value and type, and if you do not define it using a symbol, as will assume that all properties are 0, indicating that the symbol is an externally defined symbol

The value of the symbol is usually 32-bit, according to the difference between absolute area and text, data, BSS, the symbol is also divided into relative symbol, and absolute symbol

The value of LD for an undefined symbol is 0, which means that the symbol is not defined in this assembler, and the LD attempts to determine its value based on the other linked file, and if the undefined symbol is not 0, Then the symbolic value represents the length of the public storage space that the Comm Public declaration needs to retain, and the symbol points to the first address of the storage space.

As assembly instructions

Assembly instructions are pseudo-instructions for how the assembler operates. Assembly commands are used to require the assembler to allocate space for a variable, determine the program start address, specify the current assembly's area, modify the position counter value, and so on. The names of all assembly commands are in ". "At the beginning, the rest are characters and are case-insensitive, (but usually lowercase).

1. Align Abs-expr1, abs-expr2, ABS-EXPR3

. Align is a storage alignment assembly command that sets (increments) the position counter in the current sub-region to the next specified storage boundary. The first parameter for the a.out format indicates that the next boundary is specified by 2n, for the elf format file, the next boundary is specified by N, the second parameter represents the byte value to be padded for alignment, the omitted is 0, and the third parameter represents the maximum number of bytes that the alignment operation allows to skip.

2,. ASCII "string" ...

Allocates space for a string from the current position indicated by the position counter and stores the string. You can write multiple strings separately using commas. The assembly command causes the as to assemble the strings at successive address locations, and does not automatically add 0 (NULL) bytes after each string.

3.. asciz "String" ...

This assembly command is similar to ". ASCII", but "NULL" characters are added automatically after each string

4,. Byte expressions

The assembly command defines 0 or more byte values separated by commas, and the value of each expression is a byte

5. Comm Sysmbol, length

Declare a common area in BSS, in which a common symbol in a target file is merged with the common symbols of other target files; If LD does not find a definition of a symbol, but only one or more public symbols, then LD assigns uninitialized memory of length bytes specified. Length must be an absolute value expression, and if LD finds multiple common symbols of different lengths but with the same name, the LD allocates the largest space

6. Data subsection

The assembly command informs as to assemble the subsequent statements into the data sub-area of the number subsection, if the ellipsis is 0

7. Desc symbol, ABS-EXPR

To set the value of an absolute expression to the Description field N_DESCD 16-bit value of the symbolic symbol

8. Fill repeat, size, value

The assembly command produces a number of duplicate copies (repeat) of size bytes, if size is greater than 8, the maximum is 8, each repeating byte is taken from a 8-byte number, the height 4 bytes is 0, and the low 4 bytes is value

9. Global symbol

So that the assembler LD can see the symbolic symbol, if we define the symbolic symbol in the target file, then its value will be able to be used by other target files in the link process, if the target file does not define the symbol, then its properties will be in the link process other target files. (This is implemented by setting the N_ext field for the symbol symbols)

10,. int expressions

Set 0 or more integer values in a zone, and each comma-separated value is the run-time value

11.. lcomm symbol, length

The local common areas specified for symbol symbols retain space of length bytes,

12,. Long expressions

With. int

13,. Octa Bignums

Specifies a 16-byte large number (. Byte. Word. Quad. OCAT corresponds to 1 2 4 8 16 bytes respectively)

14. org NEW_LC, fill

The current position counter is set to NEW_LC (not spanned, note that the position counter is region-based), fill indicates a value that crosses bytes, and if omitted, fill 0

15,. Quad Bignums

16. Short expressions (with. Word expressions)

17. Space size, fill

Produces a size byte, 0 per byte padding fill,fill omitted

18. String "string"

Defines one or more comma-separated strings that can be used in a string with escape characters and automatically null after each string

19. Text subsection

19. Word Expressions

Command options for AS Assembly

-A Open program list

-F Quick Action

-o Specifies the output destination file name

-R combined Data area and code area

-W Cancel warning message

Linux kernel fully annotated programming language and environment (i)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.