Basic assembly language knowledge

Source: Internet
Author: User

I. Overview
The assembler converts the source code to the target representation represented by binary code, that is, the OBJ file; then, the linked program links the target file with the library file and other target files to form an executable file (exe file ). The assembly language source program has poor versatility and is generally not portable. Compared with advanced languages, assembler languages provide a very compact structure and fast operation speed, allowing you to maximize the performance of computer hardware. Relationship between Assembler and target program and executable program 1:

Figure 1 Relationship between Assembler and target program and executable program

Ii. Assembly Language specifications

1. The source program consists of two Commands
(1) Assembly statements (command statements)
(2) pseudoinstructions (indicative statement)

2. Instruction format
(1) command statement format:
MARK: mnemonic parameter,..., parameter; Comment
(2) Format of the indicative statement:
Name command parameter,..., parameter; Comment

3. Features
(1) command statements are translated into machine code, such as mov and Add.
(2) The indicative statement will not be translated into machine code. It is used to guide the Assembly program to perform operations, such as defining symbols, allocating storage units, and initializing storage, the indicative statement itself does not occupy storage units. Operations processed by the assembler during the compilation of the source program.

Iii. Glossary

1. Variables
The operand stored in the storage unit is a variable, which has three attributes:
(1) segment attributes (segment)
The segment address of the storage unit where the variable is stored.
(2) offset attribute)
The offset between the storage unit address and the start address of the segment that stores the variable.
(3) Type)
There are three types of variables: byte, word, and double word ).

2. Label
The name of the storage unit address that stores commands. Labels also have three attributes: Segment, offset, and type.
The Type attribute of a label is different from that of a variable. Its type is near or far. The default value is near.
The number can be used as the target operand of the JMP and call commands.

4. Introduction to the two Commands

(1) Assembly statements (command statements)
For instructions, see http://blog.csdn.net/scucj/archive/2009/06/19/4281659.aspx.

(2) pseudoinstructions (indicative statement)
1. Symbolic Definition Statement
(1) equivalent statement equ
Format: Name equ expression
Function: defines a value for a symbol, or a symbolic name, or even an executable command.
The equ statement cannot be redefined before it is released. Release statement rureg. You can use the undefined symbol of the statement to redefine it.
In addition, the equal sign statement "=" is similar to the equ statement, but it can redefine the symbol.
Example:
New_port equ port_val
Rureg new_port
New_port equ port_val + 10

2. Data Segment Definition Statement
A Data Segment definition statement allocates a storage unit for a data item and associates it with a symbolic name.
(1) dB (define byte)
Defines several bytes of data or ASCII characters starting from a specified unit.
Example:
Thing DB ?; Define a byte
Thing is a symbolic name, which is associated with a byte in memory, that is, it is a Byte variable.
"? "Indicates that the corresponding storage units are allocated to the data items, but no target code is generated to initialize these storage units.

(2) dw (define word) and DD (define double word)
DW defines data or ASCII characters starting from a specified unit. dd defines data or ASCII characters starting from a specified unit;
Example:
Big_thing DW 1234 H
12 h and 34 h are placed in two consecutive byte units, 34 h is placed in the low-address byte, and 12 h is placed in the high-address byte.

(3) DUP
It uses an initial value or a set of initial values to reinitialize the storage unit.
Example:
DB 100 DUP (0); 100 bytes Full initialization bit 0
DW 10 DUP (?); Retain 10 Characters
DB 10 DUP (10 DUP (0); 10 duplicates of 10 duplicates of 0
DB 5 DUP (, 4 DUP (3), 2 DUP (); defines 5 copies of 1, 2, 3, 3, 3

 

3. segment Definition Statement
8086 organizes programs and uses storage devices by segment. Main commands for segment definition: Segment, ends, assume, org
(1) For segment and ends, the format of segment-defined pseudo operations is as follows:
Segment_name segment
...
...
Segment_name ends

Generally, data segments, additional segments, and stack segments are pseudo operations such as definition and allocation of storage units. For code segments, commands are generally used.

(2) assume is used to clarify the relationship between segments and registers. The format is as follows:
Assume assignment,..., assignment
The assignment format is:
Segment register_name: Segment register_name
The segment register must be one of CS, DS, es, SS, FS, and GS, and the segment name must be the segment name defined in the program.

(3) For org, it specifies the starting address in the segment. The format is as follows:
Org <expression>
This statement can call the address of the next byte the value of a constant expression. For example:

Vectors segment

Org 10

Vect1 DW 1234 H

The offset address of vect1 is 0ah, that is, 10 in decimal format.

4. Process Definition Statement
A process is part of a program and can be called by the program. The call process instruction is call, and the returned instruction is ret. There are two types of commands: within and between segments.
For segments, the call and return commands only include the offset addresses in the segments into the stack and the outbound stack. For inter-segments, you need to add both the segment address and the intra-segment offset address to the stack and the outbound stack.
The format of the Process Definition Statement is:
Procedure_name proc [near]
...
RET
Procedure_name endp
Or
Procedure_name proc far
...
RET
Procedure_name endp

When calling a process, write the name of the called process after the call command. The RET command is always placed at the end of the process body to return to the main program. When the process and the main process are in the same code segment, the process can be defined as the near attribute. If not,

The process is defined as the far attribute. When the main program calls the process, you need to set the entry parameters first. After the process is executed, the running results are returned to the main program. There are four main types of parameter transfer for the main program and process [3]:
(1) Transmit parameters using internal CPU registers.
(2) When the main program and process are in the same code segment, the process can directly access the variables in the code segment.
(3) Send the variable address through the address table. Store the offset addresses of all variables in an address table, and then transmit the addresses of the address table to the address table through registers. after entering the process, you can use the register indirect addressing method to retrieve the addresses from the address table.

Variable address to access the required variables.
(4) Send parameters or parameter addresses through the stack. Before calling the process, press the parameter address into the stack using the push command in the main program. after entering the process, use the base address register bp to extract these parameter addresses from the stack and send them to the Register.

Access the required variables in indirect addressing mode.

5. Concluding remarks
In addition to end, each ending sentence and the starting statement appear in pairs. For example, segment, ends, Proc, and endp.

End indicates the end of the entire source program. The format of the end statement is as follows:

End <expression>

Expression must generate the value of a memory address, which is the address of the First Command executed when the program is executed. Example:

Start:

...

End start

Then the program starts to run from start.

V. framework structure and basic programming of assembly language source programs [3]
1. Framework Structure of assembly language source program
The standardized assembly language source program adopts a segmented structure, including the code segment, data segment, and stack segment. Any source program has at least one code segment, which is used to place programs composed of Directive statements. Stack segments and data segments are determined based on actual requirements. Data segments are used to define variables, define required data constants, and allocate storage space. They are all composed of pseudo-command statements. Stack segments are set as needed. If you use stack operations, you can use your own storage space. If no stack segment is defined, the System-defined stack zone is automatically used. A process can be placed in a code segment, or a separate process segment (another code segment) can be created ). Equ can be placed in data segments and code segments. Macro commands are generally placed at the beginning of a program. Public can be placed in any row of the program. Because each logical segment is addressable by a segment register, the corresponding segment register must be assigned a value (load address) at the beginning of the Code Program ). The assume pseudo command only specifies the correspondence between each logical segment and the segment register, that is, the logical segment of the logical segment is what attribute, and does not assign values to the segment register. Therefore, you must load the base address of the. DS, SS, and ES segments (if an extension segment exists) at the beginning of the program.

The basic framework of a single module program is as follows:

Name Module name (omitted)
Equ symbol definition area (depending on your needs)
Extrn external symbol Name Description (based on actual needs)
Public public symbol name (based on actual needs)

Segment name (such as data) segment parameter (based on actual requirements, can be omitted)
Variable definition
Preset storage space
Data Segment name (such as data) ends

Stack segment name (such as stack) segment parameter (depending on actual requirements)
Preset stack space
Stack segment name (such as stack) ends

Segment name (such as code) segment parameter (depending on actual needs)
Assume: segment register assignment

Start: mov ax, data; segment address Filling
MoV ds, ax
MoV ax, stack
MoV SS, ax
......
Main Program body
MoV ah, 4ch
Int 21 h

Process name 1 proc type description
Process body 1
Process name 1 endp

Process name 2 proc type description
Process Body 2
Process name 2 endp

Code segment name (such as code) ends

End start

2. Basic Program Design
(1) run the program in Simple Sequence
Programs executed in sequence have no branches or loops.
(2) Branch Program
The branch of the program is mainly implemented by the conditional transfer command.
(3) loop Program
Loop program design is mainly used for a program that needs to be repeatedly executed multiple times. It mainly uses loop, loopz, loopnz, or conditional transfer commands. A loop structure consists of the following parts:
1) loop body: part of the program segment to be repeated.
2) cycle end condition: the cycle end condition must be given in the loop program. Common cycles include counting cycles.
3) initial cycle state: Set the initial values before the cycle starts.
(4) subroutine
A subroutine is an independent program segment that can complete certain functions and be called by other programs.

Vi. Others
1. Address counter $
In the source program, the address counter is used to save the offset address of the currently used assembly command. You can directly use $ to reference the address counter value, indicating the address of the first byte of this command. For example:
Array DW 1, 2, $ + 4,
If the offset address allocated by array is 0074, the value of $ + 4 is 007c.

2. Select pseudo operations for the processor
8086/8088 x86 processors support instruction systems, but more advanced processors support some new instructions. Therefore, you need to tell the compiler program which processor supports the instruction sets. The following types of pseudo operations are supported:
. 8086
. 286
. 286 p
. 386
. 386 P
. 486
. 486 P
. 586
. 586 P
The default value of assembler is. 8086.

3. Storage Model and simplified segment definition pseudo operations
(1) model pseudo operation
Format: Model momory_model [, model options]
Function: it is used to specify the memory mode, that is, if each segment is stored in the memory. Under Win32, momory_model only has the flat mode.
(2) Simplified segment definition pseudo operations
. Code [name] for a code segment model, the segment name is optional
For models of multiple code segments, you must specify the segment name for each code segment.
. Data
. Data?
. Fardata [name] can specify the segment name.
. Fardata? [Name]
. Const
. Stack [size] specifies the size of the stack segment. The default value is 1 kb.

4. Attribute operator PTR
Format: Type PTR expression
Function: assign another attribute to an allocated storage address so that the address has another type. Type indicates the New Type attribute, and expression indicates the symbolic address. Type can be byte (byte), word (Word ),

DWORD, far, and near can be variables and labels (in which byte, word, and DWORD are used for variables, far and near are used for labels ). PTR does not allocate storage units to newly defined variables or labels.
Example:

Twob DW?
Oneb DB?

The Type attribute of the twob symbol address is word, and the type attribute of the oneb symbol address is byte.
The following is a definition using the ptr operator:
Varb equ byte PTR twob
Varw equ word PTR oneb

Varb and twob have the same segment address and offset address, but their type attributes are different: the type attribute of twob is word, and the type attribute of varb is byte. The same varw and oneb symbol addresses have the same segment addresses.

And offset address. The type attribute of oneb is byte, And the type attribute of varw is word.
As shown in the preceding example, PTR can be used for temporary type conversion, which is equivalent to forced type conversion in C language.

5. macro assembly and conditional assembly [3]
(1) macro assembly
1) macro commands, macro definitions, and macro calls
Macro commands can be defined by the user in the source program. Once defined, macro commands can be called multiple times in future programs.
The macro format is as follows:
Macro command name macro [actual parameters]
...
Endm

Macro call format: macro command name [actual parameter]
2) macro extension. When a source program with macro calls is compiled, each macro call is expanded by MASM. Macro expansion is actually to replace the corresponding macro command name with the macro definition design, and replace the formal parameters with actual parameters one by one.
3) macro embedding. Macro calls are allowed in macro definition, but the macro commands called must be defined first. The macro definition can also contain the macro definition.
For example, to convert the ASCII code and BCD code, you need to shift the content of Al to four places left or right, and use the 8086 command to achieve the following:
MoV cl, 4
Sal Al, Cl
Use a macro command instead. The macro definition is as follows:
Shift macro
MoV cl, 4
Sal Al, Cl
Endm
In the future, you need to move the content of Al four places to the left. You only need to use one command:
Shift
Note: Macro and endm must appear in pairs.

(2) condition assembly
Conditional assembly is to test the given conditions. Based on the test results, the assembler embeds a program into the source program or disassembles the program. The general format is as follows:
If condition (expression)
(Instruction body 1); condition is true assembly instruction body 1
Else (Instruction body 2); condition: false assembly instruction Body 2
Endif

Where: if... Endif; condition Assembly pseudocommand encoder, which must appear in pairs and cannot be omitted.

6. Build a simple assembly environment
32-bit windows program: masm32v9.0 or NASM + editplus2 + A resource editor

 

References
[1] principles and applications of Micro-Computer Systems (Fourth Edition), Zhou Mingde
[2] ibm pc assembly language programming, Shen meiming
[3] Chapter 5 assembly language programming (downloaded PDF)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.