For beginners, the compilation of many commands is too complex, often learning for a long time can not write a beautiful program, so as to hinder our interest in learning compilation, many people give up. Therefore, I personally read the legal compilation, but I don't have to write a program. Writing a program is really not a compilation strength. You may wish to play debug, sometimes crack has a more sense of accomplishment than a program (just like learning to play games first on a computer ). Some advanced commands are actually only useful to experienced assembler programmers. They are too advanced for us. To make learning assembly language a good start, you must first exclude the gorgeous and complex commands and focus your attention on the most important commands (CMP loop mov jnz ......). However, it is easy to complete the above objectives in the textbooks of rariyo, so I have compiled this ultra-concentrated article (using WinZip, WinRAR... Oppression in turn, hey !) Tutorial. Without saying anything, read this article, and you can "inadvertently" sell debug in your predecessors or later generations. It's a great sense of accomplishment. Try it! So what about next? -Here we go! (It doesn't matter if you don't understand it when you read it. It must be broken down in the following sections.) Because assembly is used to communicate with hardware through CPU and memory, so we have to first understand the CPU and memory: (the number hexadecimal problem is not mentioned here)
CPU is a chip that can execute all arithmetic/logical operations and basic I/O control functions on the computer. An assembly language can only be used for a specific CPU. That is to say, the command syntax of different CPU assembly languages is also different. Since its launch in 1981, the CPU development process for personal computers is: 8086 → 80286 → 80386 → 80486 → Pentium → ......, AMD and cyrix are also supported. Later compatible with the previous CPU functions, but added some commands (such as the MMX Instruction Set with high performance) and increased the registers (such as the 32-bit eax of 386), increases the number of registers (such as 486 FS ). To ensure that the assembler program is applicable to various models, we recommend that you use the 8086 assembly language with the best compatibility. All the languages mentioned in this article are 8086 assembly languages. Registers are internal components of the CPU, so data transmission between registers is very fast. Purpose: 1. perform arithmetic and logical operations on the data in the register. 2. The address stored in the Register can point to a location in the memory, that is, addressing. 3. It can be used to read and write data to peripheral devices on the computer. 8086 has 8 8 8-bit data registers, which can form 16-bit registers: Ah & Al = AX: Accumulate registers, which are commonly used in operations; BH & BL = Bx: base Address Register, often used for address index; ch & CL = Cx: Count register, often used for counting; DH & DL = DX: data register, often used for data transmission. To use all the memory space, 8086 sets four segment registers, specifically used to save the segment address: CS (code segment): code segment register; DS (Data Segment): Data Segment register; SS (stack segment): Stack segment register; es (extra segment): additional segment register. When a program is to be executed, it is necessary to determine the locations of the memory used by the program code, data, and stack. By setting the CS, DS, and SS segments, it points to these starting locations. Generally, DS is fixed, and CS is modified as needed. Therefore, the program can be written as any size when the addressable space is smaller than 64 K. Therefore, the size of the program and its data combination is limited to 64 KB, which is why the COM file cannot exceed 64 KB. 8086 uses memory as the battlefield, registers as the military base, and accelerates work. In addition to the registers mentioned above, there are also some special functions of registers: IP (intruction pointer): instruction pointer registers, which can be used with Cs to track the program execution process; SP (Stack pointer): A stack pointer that can be used with SS to point to the current stack position. BP (base pointer): base address pointer register, which can be used as a relative base address location of SS; Si (source index): Source Address Register can be used to store source address pointers relative to DS segments; di (destination index): the destination address change register, which can be used to store the destination address change pointer relative to the es segment. There is also a flag register FR (flag register), which has nine meaningful signs, which will be described in detail later.
Memory is a key part of computer operation and a place where the computer stores information during work. The memory organization has many storage locations that can store numbers, called "addresses ". 8086 the address bus has 20 bits, so the CPU has an addressing space of up to 1 MB, which is also an effective control range of Dos. 8086 can only process 16 bits of data, that is, only 0 to 64 K. Therefore, you must use segmented addressing to control the entire memory address. The complete 20-bit address can be divided into two parts: 1. segment Base Address (segment): Add four binary zeros after the 16-bit binary number, that is, a hexadecimal 0, to 20-bit binary number. You can set any 64 K segment in 1 m, it is usually recorded as a 16-bit binary number; 2. offset: directly uses the 16-bit binary number to point to any address in the base address of the segment. For example: 2222 (segment base address): 3333 (offset), the actual 20-bit address value is 25553. In addition to the above nutrition, you also need to know what dos and BIOS function calls are. Simply put, function calls are similar to Win95 API, which is equivalent to subprograms. It's enough to write assembler programs. If you don't need ms or IBM subroutines, you can't do that. (For more information about function calling, see "Computer Enthusiast" in issue 98-11 ).
There are two main methods to compile an assembly language: 1. Use a compiler such as MASM or tasm; 2. Use a debugging program debug. com. Debug is actually not a compiler. Its main purpose is to correct errors in Assembler programs. However, it can also be used to write short assembler programs, especially for beginners, debug is the best entry tool. The debugging operation is easy: you only need to press debug and press enter to compile the program. The process is simple. When using the compiler, you must use a text editor, compiler, Link, exe2bin, and other programs, each program must use a series of rather complex commands to process the source program. In addition, the compiler must be used to process the source program. Many indicative statements unrelated to the instruction statement must be added for the compiler to identify, using debug can avoid many obscure program lines from the beginning. In addition to assembler, debug can also be used to check and modify memory locations, load storage and execution programs, and check and modify registers. In other words, debug is designed to bring us into touch with hardware. (8086 the usage of Common commands will be explained in each assembler. It is not possible to list all commands in length ).
The a command of DEBUG can compile a simple COM file, so the program compiled by debug must start with the address 100 h (the COM file requirements. Follow me, setp by setp (Press ENTER ):
Input a100; assemble from DS: 100
2. Input mov DL, 1; load the value 01 H into the DL register
3. Input mov ah, 2; load the value 02 h into the DL register
4. enter int 21. Call dos 21 to interrupt the 2 th function to display the characters loaded with DL one by one.
5. Enter INT 20. Call dos 20 to interrupt the program, terminate the program, and return the control to debug.
6. Press enter.
7. Now the Assembly Language Program has been put into the memory, input g (run)
8. Result: A symbol is output.
Because word97 cannot display the original result, you can find a fake product.
Program terminated normally
We can use the U command to assemble the hexadecimal machine code (unassemble) into an assembly command. You will find that the Assembly command on the right of each line is compiled into the corresponding machine code, and 8086 is actually executed using the machine code.
1. Input u100, 106
1fed: 0100 b201 mov DL, 01
1fed: 0102 b402 mov ah, 02
1fed: 0104 cd21 int 21
1fed: 0106 CD20 INT 20
Debug can use the R command to view and change the register content. CS: IP Address Register, which saves the address for executing commands.
1. Enter R
Ax = 0000 BX = 0000 Cx = 0000 dx = 0000 sp = ffee BP = 0000 Si = 0000 di = 0000
DS = 1fed es = 1fed Ss = 1fed cs = 1fed IP = 0100 NV up ei pl nz Na Po NC
1fed: 0100 b201 mov DL, 01
When the program is executed from DS: 100, when the program is terminated, debug automatically resets the IP content to 100. If you want to make this program into an independent executable file, you can use the N command to name the program. But it must be a COM file. Otherwise, it cannot be loaded with debug.
Input n smile. com; we have to tell the debug program length: The program starts from 100 to 106, so it occupies 7
; Bytes. We use Bx to store the high part of the length value and CX to store the low part.
2. Enter RBx to view the content of the Bx register. This program has only 7 bytes, so this step can be omitted.
3. Enter rcX to view the content of the Cx register.
4. Input 7; number of bytes of the program
5. Enter W. Run W to write the program to the (write) disk.
At this point, we can truly access the 8086 assembly instruction. When writing assembly language programs, we usually do not directly put the machine code into the memory, but into a string of mnemonic symbols. These symbols are easier to remember than hexadecimal machine codes, this is the Assembly command. The mnemonic number, indicating the operation that the CPU should perform. That is to say, the assembler language composed of mnemonic is designed for humans, and machine language is designed for PCs.
Now, let's analyze a program that can display all ASCII codes.
1. Enter debug
2. Enter a100
3. Input mov CX, 0100; number of loaded cycles
MoV DL, 00; load the first ASCII code, and then load the new code every cycle
MoV ah, 02
Int 21
INC dl; Inc: Incremental command. Each time the value in the data register DL is added to 1
Loop 0105; loop: the loop command. Every time a loop is executed, the Cx value is reduced by 1 and
; To the starting address of the loop 105 until the Cx is 0, the loop stops
INT 20
4. Enter g to display all ASCII codes
When we want to display any string, such as understand ?, You can use the dos21h to interrupt the 9h function. Enter the downstream program, save the disk, and run the following command:
1. Enter a100
MoV dx, 109; DS: dx = start address of the string
MoV ah, 9; dos 09h function call
Int 21; string output
INT 20
Db' understand? $ '; Define a string
In assembly languages, there are two different commands: 1. regular commands, such as mov, are CPU commands used to tell the CPU what to do during program execution, so it will be stored in the memory in the op-code mode; 2. pseudocommands: such as DB, are the commands of the debug and other compilers to tell the compiler what to do during compilation. The dB (define byte) command is used to tell debug to put all ASCII codes in single quotes into memory. The string that uses the 9 h function must end with $. Run the D command to view the contents of the database pseudo command in the memory.
6. Enter d100
1975: 0100 Ba 09 01 B4 09 CD 21 CD-20 75 6e 64 65 72 73 74 ......!. Underst
1975: 0110 61 6e 64 24 8B 46 F8 89-45 04 8B 46 34 00 64 19 and $. F... e... f4.d.
1975: 0120 89 45 02 33 C0 5E 5f C9-C3 00 C8 04 00 00 57 56. E.3. ^ _ ...... WV
1975: 0130 6B F8 0e 81 C7 Fe 53 8b-df 8B C2 E8 32 Fe 0b C0 K ...... S...
1975: 0140 74 05 33 C0 99 EB 17 8b-45 0C E8 D4 97 8B F0 89 t.3. ...... e .......
1975: 0150 56 Fe 0b D0 74 EC 8B 45-08 03 C6 8B 56 Fe 5E 5f v... t... e... v. ^ _
1975: 0160 C9 C3 C8 02 00 00 6B D8-0E 81 C3 Fe 53 89 5E Fe ...... K ...... S. ^.
1975: 0170 8B C2 E8 fb fd 0b C0 75-09 8B 5E Fe 8B 47 0C E8 ...... U... ^ ..
Now let's look at another program: Input arbitrary strings on the keyboard and display them. DB 20 indicates that debug retains 20 h of unused memory space for the buffer zone.
Enter a100
MoV dx, 0116; DS: dx = buffer address, which is determined by the DB Directive
MoV ah, 0a; 0ah function call
Int 21; keyboard input buffer
MoV DL, 0a; because the function Ah adds a homing code at the end of each string (0dh by enter
MoV ah, 02; generated), so that the cursor automatically returns to the front of the input line, in order to make the new output
Int 21; the string does not cover the original input string, so use the function to add one after two hours
; Line feed code (OAH) to move the cursor to the front of the next line.
MoV dx, 0118; start position of the loaded string
MoV ah, 09; 9h the output will be stopped only when the $ symbol is encountered. Therefore, the string must be added at the end.
Int 21; $; otherwise, the 9 h function will continue to display useless data in the memory randomly.
INT 20
DB 20; Define the buffer
Send you a word: do not be confused when learning compilation.
We will not talk about it. To do well, you must first sharpen your tools. Rather than a compiler, debug is a "literal interpreter". The a command of DEBUG can only convert a line of Assembly commands into machine languages and execute them immediately. The true Compiler (MASM) uses the Text Editor (edit) to create an independent. ASM text file, called the source program. It is the input part of the MASM program. MASM compiles the input ASM file into a. OBJ file, which is called the target program. The OBJ file only contains information about where each part of the program is to be loaded and how it is merged with other programs. It cannot be directly loaded into the memory for execution. Link of The Link program can convert the OBJ file into an EXE file that can be loaded into the memory for execution (execute. You can also use exe2bin to convert the qualified EXE file into a COM file (the COM file not only occupies the least memory, but also runs the fastest ).
Next we will use MASM to write a program with the same functions as the first program written with debug.
Use Edit to edit an smile. ASM source code file.
Source program debug program
Prognam segment
Assume Cs: prognam
Org 100 h a100
MoV DL, 1 mov DL, 1
MoV ah, 2 mov ah, 2
Int 21 h int 21
INT 20 h INT 20
Prognam ends
End
Comparison: 1. because MASM assumes that all values are in decimal format and debug only uses hexadecimal notation, in the source program, we must add letters representing the hexadecimal format after the numbers, for example, h indicates hexadecimal, and D indicates decimal. If it is a hexadecimal number starting with a letter, you must add 0 before the letter to indicate that it is a number, such as 0ah. 2. add five lines to the source program: prognam segment and prognam ends are paired to tell MASM and link. This program will be placed in a program segment called prognam (program name, the segment name (prognam) can be used, but its position must be fixed. Assume Cs: prognam must start with the program and be used to tell the compiler where the program is located in the CS register. End is used to tell MASM that the program ends here. org 100 h is equivalent to the a100 of debug, which is compiled from the offset. All source programs of the COM file must contain these five elements and must appear in the same order and position. Just write down this item. Next, we use MASM to compile smile. ASM.
You do not need to enter the additional name. ASM when entering the MASM smile plugin.
Microsoft (r) macro proceser version 5.10
Copyright (c) Microsoft Corp 1981,198 8. All rights reserved.
Object filename [smile. OBJ]: whether the alias changes the output OBJ file name. If not, enter
Source listing [NUL. lst]
Cross-reference [NUL. CRF]: whether or not the audit file (CRF) needs to be compared. If not, enter
50162 + 403867 bytes symbol space free
0 warning errors warning error, indicating that the compiler does not understand certain statements, usually input errors.
0 severe errors failed is a serious error that may cause program execution to fail, usually due to a syntax structure error.
If no error exists, the OBJ file can be generated. OBJ contains the compiled binary result, which cannot be loaded into the memory by DOS and must be chained (linking ). Link files (smile. OBJ) are chained to an EXE file (smile. EXE ,.
1. Enter the link smile extension without adding the OBJ name.
Microsoft (r) Overlay linker version 3.64
Copyright (c) Microsoft Corp 1981,198 8. All rights reserved.
Run file [smile. EXE]: Indicates whether to change the output EXE file name. If not, enter
List file [NUL. MAP]: whether the list file (MAP) is required for the struct. If not, enter
Libraries [. Lib]: indicates whether the library file is required for libraries. If you want to enter a file name, enter
Link: Warning l4021: no stack segment timeout error message because the COM file does not use the stack segment
Running "no stack segment" does not affect normal Program Execution
Now that the EXE file has been generated, we also need to use exe2bin to convert the EXE file (smile. EXE) to the COM file (smile. com ). Enter exe2bin smile to generate the BIN file (smile. Bin ). In fact, the BIN file is exactly the same as the COM file. However, since DoS only recognizes the COM, EXE, and BAT files, the binfile cannot be correctly executed. You can rename the file or directly enter exe2bin smile. com. Now, there should be a smile. com file on the disk. You just need to enter the file name smile in the prompt symbol C:> to execute this program.
Do you think it is much more troublesome to use a compiler to generate a program than debug! This is true for small programs, but for large programs, you will find its advantages. Let's run the ASCII program again in the form of a compiler to see if there are any differences. First, use Edit. com to create the ASCII. ASM file.
Prognam segment; definition segment
Assume Cs: prognam; put the base address of the segment defined above into CS
MoV CX, 100 h; number of loaded cycles
MoV DL, 0; load the first ASCII code, and then load the new code every cycle
Next: mov ah, 2
Int 21 h
INC dl; Inc: Incremental command. Each time the value in the data register DL is added to 1
Loop next; loop command, run once, CX minus 1 until CX is 0, the loop stops
INT 20 h
Prognam ends; Segment termination
End; end of Assembly
In the assembly language source program, each program line contains three elements:
Start: mov DL, 1; load the first ASCII code, and then load the new code every cycle
Identifier expression Annotation
Adding annotations to the original file can make the program easier to understand and facilitate future reference. Each line of annotation is separated. The compiler ignores the annotation and the annotation data will not appear in the OBJ, EXE, or com file. Because we do not know the address of each program line when writing the source program, we must use a symbolic name to represent the relative address, which is called an "identifier ". We usually type the identifier at the appropriate position of the appropriate row. The identifier (Label) can be up to 31 bytes, so we try to use concise text as the identifier in the program. Now, you can compile the ASCII. ASM file into ASCII. com. 1. masm ascii, 2. Link ASCII, 3. exe2bin ASCII. com.
Note: When you compile your program with a compiler, typing errors, misspelled identifiers, H fewer hexadecimal numbers, and logical errors often occur. The advice a veteran of assembler often gives new people is: it is best to expect that the program he wrote will be somewhat wrong (someone else told me); if the program is executed for the first time, it will get the expected result, you 'd better check it again because it may be wrong. In principle, as long as the general logic architecture is correct and the process of finding errors in the program is more interesting than writing the program itself. When writing a large program, it is best to divide it into many modules, so that the program itself can be simple, easy to write and troubleshoot, and also make the boundaries between different parts of the program clearer, saves Compilation Time. If the reading program has something that doesn't understand, it is best to take notes on the registers, memory, and other content in the paper, and slowly draw on the paper, it will suddenly become open. Next we will write a decimal value that can be obtained from the keyboard, and convert it into a hexadecimal value, which is displayed on the screen as a "large program ". To enable 8086 to execute such a function, we must first break down the problem into a series of steps called program planning. First, use a flowchart to ensure that the entire program is logically correct (needless to say! This step is required for all languages ). This modular planning method is called "top-down program planning ". When writing a program, it starts with the smallest unit module (subroutine). When each module is completed, it is merged into a large program, the method of starting small points is called "bottom-up programming ".
Our first module is binihex, which is mainly used to retrieve the binary number from the 8086 BX register and display it on the screen in hexadecimal mode. NOTE: If subprograms cannot run independently, they are normal.
Binihex segment
Assume Cs: binihex
MoV CH, 4; number of hexadecimal digits after record conversion (four digits)
Rotate: mov cl, 4; Use Cl as the counter to record the number of register digits moving
Rol BX, Cl; the content of the cyclic register BX for processing 4 hexadecimal numbers in order
MoV Al, BL; transfers the data in the Low eight bits of Bx to Al
And Al, 0fh; clear useless bits
Add Al, 30 h; Add the data in Al for 30 h, and store the data in Al
CMP Al, 3ah; comparison with 3ah
Jl printit; transfer if less than 3ah
Add Al, 7 h; Add the data in Al to 30 h and store the data in Al
Printit: mov DL, Al; load the ASCII code into DL
MoV ah, 2
Int 21 h
Dec ch; ch minus one, to zero, zero sign set 1
Jnz rotate; jnz: When the zero sign is not set to 1, the system jumps to the specified address. That is, transfer if the parameter is not equal.
INT 20 h; return from subroutine to main program
Binihex ends
End
The ROL loop register Bx (BX content will be provided by the second subroutine) is used to process 4 hexadecimal numbers in sequence: 1. use Cl as a counter to record the number of register shifts. 2. Move the first hexadecimal value of Bx to the rightmost. Use the and (logical "and" Operation: when the corresponding bit is 1, the result is 1, and the rest is zero) to clear unnecessary parts. The result is as follows: first, store the BL value in Al, and then use and to 0fh (00001111) to clear the four digits on the left of Al. Since the ASCII code from 0 to 9 is 30 h to 39 h, and the ASCII code from A to F is 41h to 46 h, the result is 7 h interrupted: if the content of Al is less than 3ah, the Al value is only 30 h; otherwise, the Al value is 7 h. The add command adds two expressions and stores the results in the left expression. Flag register is a separate 16-bit register with 9 flag. When some assembly commands (mostly those involving comparison, arithmetic, or logical operations) are executed, the corresponding flag position is 1 or 0, and the common signs include the zero sign (ZF), the symbol sign (SF), the overflow sign (of), and the incoming sign (CF ). The flag stores the impact on a command after it is executed. Other related commands can be used to identify the status of the flag and generate actions based on the status. CMP commands are similar to subtraction. They subtract the values of two expressions, but the content of registers or memory is not changed, but the relative flag is changed. If the Al value is smaller than 3ah, the positive and negative signs are set to 0, and the opposite is set to 1. The JL command can be interpreted as follows: if it is less than, it is transferred to the specified position. If it is greater than or equal to, it is executed downward. CMP can be used together with conditional transfer commands such as JG and JL to form the branch structure of the program. It is a common technique used to write assembler programs.
The second module, decibin, is used to receive the decimal number entered by the keyboard and convert it into a binary number and place it in the Bx register for Module 1 binihex.
Decibin segment
Assume Cs: decibin
MoV BX, 0; BX cleared
Newchar: mov ah, 1;
Int 21 h; read a keyboard input symbol into Al and display
Sub Al, 30 h; Al minus 30 h, the result is saved in Al, complete ASCII code to binary code
Jl exit; transfer if it is less than zero
CMP Al, 9d
JG exit; transfer from left to right
CBW; 8-bit Al to 16-bit ax
Xchg ax, BX; swap data within ax and BX
MoV CX, 10d; 10 in decimal number
Mul CX; the expression value is multiplied by the ax content and the result is stored in ax
Xchg ax, BX
Add Bx, ax
JMP newchar; unconditional transfer
Exit: INT 20; return to the main program
Decibin ends
End
The actual result of CBW is: if the value in Al is positive, ah is filled with 00 h; otherwise, ah is filled with FFH. Xchg is usually used when you need to temporarily retain the content in a register.
Of course, a subprogram (CRLF) will not cover the first input decimal number for the hexadecimal number displayed later.
CRLF segment
Assume Cs: CRLF
MoV DL, 0dh; enter the ASCII code 0dh into DL
MoV ah, 2
Int 21 h
MoV DL, 0ah; line feed asⅱ code 0ah into ah
MoV ah, 2
Int 21 h
INT 20; main program
CRLF ends
End
Now we can combine binihex, decibin, CRLF and other modules into a large program. First, we need to slightly modify the subprograms of these three modules. Then, write a program to call every subroutine.
CRLF proc near;
MoV DL, 0dh
MoV ah, 2
Int 21 h
MoV DL, 0ah
MoV ah, 2
Int 21 h
RET
CRLF endp
Similar to the pseudo commands of segment and ends, Proc and endp appear in pairs to identify and define a program. In fact, the true role of Proc is to tell the compiler that the called program is a short-range (near) or remote (FAR ). The general program is directly called by debug, so it is returned with INT 20. The program called with the call command uses the return command ret, and RET transfers the control to the address indicated by the top of the stack, the address is put by the call command that calls this program.
All modules are done, and then we can combine the subprograms to complete them.
Decihex segment; main program
Assume Cs: decihex
Org 100 h
MoV CX, 4; number of cycles into CX; since the subroutine uses CX, the subroutine needs to import CX into the stack
Repeat: Call decibin; call a decimal to binary subroutine
Call CRLF; call the add-back and line feed subroutine
Call binihex; call binary to hexadecimal format and display subroutine
Call CRLF
Loop repeat; loop 4 times, can be operated 4 times in a row
MoV ah, 4ch; call dos21 to interrupt the 4C function and exit the program. The function is equivalent to INT 20 h.
Int 21 h; same, but more widely used. If int20h cannot return, try it.
Decibin proc near push CX; press CX into the stack ,;
Restore Exit: Pop CX; restore CX; retdecibin endp binihex proc near push CX
┇ Pop CX retbinihex endp CRLF proc near
Push CX
┇ Pop CX retcrlf endpdecihex ends end
The call command is used to call a subroutine, transfer the control to the subroutine address, set the downstream command address of the call as the return address, and press it into the stack. Call can be divided into two types: short-range (near) and remote (FAR): 1. Near: the IP content is pushed into the stack for the program and the program in the same segment. 2. Far: The content of CS and IP register is pushed into the stack in sequence, which is used for programs and programs in different segments. Push and pop are a pair of commands used to press the register content into and pop up to protect register data. Many subprograms are used in call. The stack pointer has a "back-to-first-out" principle, such as push ax, push BX... So that pop Bx and pop ax can protect the data.
The compilation language hyper-concentration tutorial is coming to an end, hoping to lay the foundation for your independent design. More and better techniques depend on your accumulation. Wish you success!