Steps for Assembler:
1. Use the editing program to create the. ASM source file
2. Use the MASM program to convert the. ASM file to the OBJ file.
3.use the linkprogram to replace the. objfile with the. exefile or use the exe2binprogram to replace the. exe file with the. com file.
Structure of assembly language source program
The source program consists of segments. Each segment has a segment name. The segment defines the start point of the segment and the ends defines the end point of the segment.
Each segment consists of several statement lines, and the source program ends with end.
The source program of assembly language is segment-structured: code segment, data segment, stack segment, and additional segment
1. Machine commands: commands-the statements that make the CPU Generate actions and process them only when the program is executed. These are the processor commands learned in the previous chapter.
2. Pseudo-Directive (Directive)-descriptive statements processed by the assembler before execution of the program, such as data description and variable definition.
Pseudo commands are irrelevant to the specific processor type, but related to the assembler version.
3. Macro commands: these commands are composed of a series of commands or pseudo commands. Several machine commands are displayed during compilation to improve programming efficiency.
Pseudoand macro commands are processed by the assembler during compilation
1. Three attributes of a variable:
Segment value (SEG): Start address of the variable segment
Offset: the number of bytes between the start point and the storage unit of the variable.
Type: the number of bytes occupied by the variable.
2. Label: the symbolic address of the storage unit that stores commands. It is often used as a transfer address. As defined in next:
The three attributes of the label are the same as those of the variable type, including near and far.
Introduction to pseudo operation symbols:
DB pseudo operation: used to define the bytes. Each subsequent operand occupies one byte.
DW operation: defines the word. each operand after the definition occupies one word (the low byte is in the first byte address, and the high byte is in the second byte address ).
Dd pseudo operation: this operation is used to define dual characters. each operand after this operation occupies two characters.
DQ pseudo-operation: used to define four words, and each subsequent operand occupies four words.
DT pseudo operation: this operation is used to define ten bytes. Each subsequent operand occupies ten bytes to form a compressed BCD code.
The operand can be a constant or an expression (a constant can be obtained based on this expression), as shown in
Data_byte dB 10, 4, l0h
Data_word DW 100,100 H,-5
DATA-DW DD 3*20, 0 fffdh.
Message dB 'h2llo'; defines the string
Operand? Storage space can be retained, but data is not stored.
For example, ABC db 0 ,?, ?, ?, 0
Dff dw ?, 52 ,?
The operand field can also use the copy operator to copy an (or some) operand. The format is:
Repeat_count DUP (operate, operate ,...)
Repeat_count can be an expression, and its value should be a positive integer to specify the number of repeated operands in parentheses.
Arrayl DB 2 DUP (0, 1, 2 ,?)
It is equivalent to the following statement:
Arrayl db 0, I, 2 ,?, 0, 1, 2 ,?
DUP operations can be nested, such
Array3 dB 100dup (DUP)
Like equ, = pseudo operations can also be used as value assignment pseudo operations. The difference between them is that the expression names in the equ pseudo operation cannot be defined repeatedly, while the = pseudo operation can be defined repeatedly.
EMP = 7;
EMP = EMP + 1;
The physical address of the memory is composed of the segment address and the offset address. When the assembler converts the source program to the target program,
The offset addresses of labels and variables must be determined, and relevant information must be transmitted to the Connection Program through the target module,
So that the connection program connects different segments and modules to form an executable program. To do this, you need to define pseudo operations with segments.
We also need to clarify the relationship between fields and segment registers, which can be implemented by assume pseudo operations. The format is:
Assume DS: Data
The segment register name must be one of CS, DS, es, and SS, and the segment name must be the segment name defined by segment.
Assume nothing can cancel the segment register specified by assume.
The number of loaded DS and ES cannot be directly sent to the segment register.
Transfer through registers
For example: mov ax, data1
MoV ds, ax
MoV ax, data2
MoV es, ax
Indicates that the assembly program MASM ends the assembly process at this point (note that the Assembly is not terminated by the source program)
The source program must end with an end statement.
The segment pseudo operation can also add descriptions of types and attributes. The format is as follows:
Segment name segment [positioning type] [combination type] [segment word] [category name]
Segment name ends
Generally, these descriptions are not required. However, if you need to use a Connection Program to connect the program to other program modules, you need to use these instructions. The descriptions are as follows:
1. The positioning type (align_type) can be:
In fact, the positioning type can be understood as "some provisions on the CIDR blocks of the two adjacent CIDR blocks ". The positioning types can be as follows:
Para: specify that the defined segment starts from the short segment boundary. It is actually required that the difference between the starting address of the segment and the starting address of the previous segment must be an integer multiple of 16 bytes.
This means that the minimum deviation between the two adjacent segments must be 1.
Page: specifies that the defined segment starts at the page boundary. It is actually required that the difference between the starting address of this segment and the starting address of the previous segment must be an integer multiple of 256 bytes.
Byte: The Defined segment starts from the byte boundary. In fact, this segment can start from any address.
Word: the defined segment starts with the word boundary. It actually means that the segment can only start with an even number of addresses.
If the positioning type of the segment is para or page specified in the source program, the segment addresses of the adjacent segments in the executable files are different.
However, if the positioning type of the segment specified in the source program is byte or word, the segment addresses of the adjacent segments in the executable file may be the same.
When combine_type is set to a certain combination type, the link program can be notified to organize the same types of segments scattered in different modules in one segment, this makes the final executable file structure clearer.
The public segment is connected with other segments with the same name. The connection order is specified by the connection command.
The common segment has the same starting address as other segments with the same name during connection, so overwrite is generated. The connection length of common is the maximum length in each segment.
At expression indicates that the starting address of a segment is the 16-bit segment address calculated by the expression. However, it cannot be used to specify a code segment.
Stock specifies the part of the stack segment at runtime.
Memory specifies that the segment will be allocated before all other connected segments (on the high address). If there are several specified memory segments connected, the first segment will be used as the memory segment, others are used as common segments.
(3) segment character attribute/addressing type (use) -- this is the attribute set for 32-bit segments. For 16-bit x86cpu, the default 16-bit segment is use16. For 32-bit x86cpu command assembly, the default 32-bit segment is use32, however, you can use use16 to specify the standard 16-bit segment. Compile an assembly language program running in the real address mode (8086 working mode), which must use 16-bit segments. That is, for 80386 and above, it can be set to use16 or use32, representing 16-bit addressing or 32-bit addressing. When the processor is 386 or above, the default protection mode is use32, the other mode is use16. For CPUs smaller than 386, the 32-bit addressing mode is not available, but only use16.
(4) Class: when connecting program organization segments, all segments of the same category are adjacent to each other. The segment category can be named arbitrarily, but it must be enclosed in single quotes. Most MASM programs use 'code', 'data', and 'stack' to indicate the segment, data segment, and stack segment respectively, to keep all code and data in sequence.
5. Program start and end
At the beginning of the program, you can use name or title as the module name. The name format is:
The assembler uses the given module_name as the module name. If the program does not have the name pseudo operation, you can also use the title pseudo operation in the format;
The title pseudo operation specifies the title printed on each page. At the same time, if the program does not use the name pseudo operation, the program uses the first six characters in text as the module name. Text can contain up to 60 characters. If the program has neither a name nor a title pseudo operation, the source file name is used as the module name. Therefore, name and tille pseudo operations are not necessary,
The format of the pseudo operation that indicates the end of the source program is:
The start address of the program execution. If multiple program modules are connected, only the main program uses labels, and other subprogram modules only use end instead of specifying labels.
Sat. Align with pseudo operations
1. Make the next byte address an even number. It is best to start a word address from an even address. Therefore, to ensure that a word group starts from an even address,
You can use the even pseudo operation before it to achieve this goal.
2. org constant expression
If the constant expression value is N, the org pseudo operation can make the address of the next byte the value n of the constant expression. Example:
Vect1 DW 47a5h
Vect2 DW 0c596h
Then, the offset address value of vect1 is 0ah, while that of vect2 is 14 h.
During the compilation process of the source program, the address counter is used to save the address of the instruction currently being compiled. The value of the address counter can be expressed as $. The Assembly Language allows you to directly use $ to reference the value of the address counter. Therefore
Org $ + 8
It can skip the 8-byte storage zone.
VII. Base Control pseudo operations
The assembly program defaults to a decimal number. Therefore, when using constants represented by other bases, You need to specifically mark them as follows:
Binary: followed by letter B
Decimal: Default, followed by the letter D
Hexadecimal: followed by the letter H. If the first letter is a A-F, it should be preceded by a number 0
Octal: followed by O or Q.
· Radix pseudo operations can change the default base number to any base number in the range of 2-16. The format is as follows:
· Radix expression
A string can be regarded as a String constant. You can place the string in single quotes or double quotation marks to obtain the ASC code value of the string, for example, 'abcd '.