To write a simulator of 8086, the first step is to learn the machine instruction format of 8086. Some problems have been solved, and many problems still exist.
Please download a document: Opcodes Manual
Http://byhh.net/f/CS/1175690465/opcodes.rar
OO: Function
00: If Mmm = 110, then a displacement follows the operation; otherwise, no displacement is used
01: an 8-Bit Signed displacement follows the opcode
10: A 16-Bit Signed displacement follows the opcode
11: Mmm specifies a register, instead of an addressing mode
Mmm: Function
000: DS: [bx + Si]
001: DS: [bx + DI]
010: SS: [bp + Si]
011: SS: [bp + DI]
100: DS: [Si]
101: DS: [di]
110: SS: [BP]
111: DS: [BX]
Rrr: W = 0: W = 1: reg32
000: Al: ax: eax
001: CL: Cx: ECx
010: DL: DX: edX
011: Bl: Bx: EBX
100: Ah: SP: ESP
101: CH: BP: EBP
110: DH: Si: ESI
111: BH: di: EDI
SSS: segment register
000: es
001: CS
010: SS
011: DS
100: FS (only 386 +)
101: GS (only 386 +)
Rrr: index register
000: eax
001: ECx
010: edX
011: EBX
100: No Index
101: EBP
110: ESI
111: EDI
××××××××××××××××××××××××××××××××××××××××××× ×××××
Online materials:
---------------------------------------
Command 1-> mov ax, the machine code corresponding to 1234 H is b83412
Solution: Judge-> This is the 8086 Assembly 16-bit assembly instruction format, and it is an immediate addressing method.
Look up the table-> open the opcodes.html file, find the title "main instructions", click the "M" letter, and view the "opcode" in the format of "mov Reg, Imm": 1011 wrrr
Further judgment-> since the Register is ax, the number of immediate operations is 1234 H, which is obviously a "word operation", so W = 1
In view the above "table 1" in this article, the corresponding RRR value is 000
Combination result-> W = 1
RRR = 000
1011 wrrr = 1011 1000b-> b8h
According to "3. Instruction format Introduction"-> b8h + | immediate count (after the low position is at the top) | = b83412h
PS: the "+" symbol is not the "plus sign"
Okay. The first question has been solved. The machine code is b83412h.
---------------------------------------
Command 4-> mov ax, the machine code corresponding to BX is: 8bc3
Solution: Judge-> This is the 16-bit assembly instruction format of 8086 assembly and the register addressing method.
Look up the table-> open the opcodes.html file, find the title "main instructions", click the "M" letter, and then view the "opcode" in the format of "mov Reg, Reg" as 1000101 woorrrmmm
Judge again-> because the register is ax, and the BX is obviously "operate on words", So w = 1
In view of table 4 in this article, oo = 11 is found because it is a register addressing method.
In view of the above "table 1", the corresponding RRR value is 000, because when both operands are registers, it is generally subject to the destination operand
In view of the "Table 3" mentioned in this Article, the mmm value is found to be 011, because when both operands are registers, the source operand should prevail.
Combination result-> W = 1
OO = 11
RRR = 000
Mmm = 011
1000101 woorrrmmm = 1000 1011 1100 0011b-> 8bc3h
Okay, I have solved the second question. The machine code is 8bc3h.
××××××××××××××××××××××××××××××××××××××××××× ×××××
The following is an example for better practice:
[Insert image, Turbo debug]
-----------------------------------
MoV ax, 5354
Machine code format: mov Reg, Imm ==> 1011 wrrr
Word operation, W = 1
The index number of ax is RRR = 000
10111000 => B8
So mov ax, 5354 ==> b85453
-----------------------------------
MoV ds, ax
Machine code format: mov seg, reg16 ==> 10001110 oosssmmm
Mmm needs to be interpreted as a register, so oo = 11
Segment register DS = 011, so SSS = 011
Ax = 000, so mmm = 000
Oosssmmm = 11011000 => D8
MoV ds, Ax => 8ed8
-----------------------------------
Lea dx, [0025]
Machine code format: Lea regword, Mem 10001101 oorrrmmm
This is a little complicated. 0025 is actually equivalent to an offset. Mmm should not be a register, so oo = 00, mmm = 110. (Very far-fetched. Who can make it more thorough? Comments are welcome !)
DX = 010, so RRR = 010
Oorrrmmm= 00010110 => 16
10001101 oorrrmmm => 8d16
Lea dx, [0025] => 8d162500
-----------------------------------
MoV ah, 09 (compare this with mov ax, 5354)
Machine code format: mov Reg, Imm ==> 1011 wrrr
Byte operation, W = 0
Ah's index number is RRR = 100
10110000 => B4
So mov ax, 5354 ==> b409
-----------------------------------
Int 21
Machine code format: int imm8 ==> 11001101 = CD
So int 21 => cd21
-----------------------------------
INC BX
Machine code format: Inc regword 01000rrr
BX = 011, so RRR = 011
So Inc BX => 01000011 = 43
-----------------------------------
XOR ax, ax
Machine code format: XOR Reg, Reg 0011001 woorrrmmm
Word operation, W = 1
All registers, oo = 11
Ax = 000, so RRR = Mmm = 000
Woorrrmmm= 111000000
0011001111000000 = 33c0
Therefore, XOR ax, Ax => 33c0
-----------------------------------
MoV Al, [BX]
Machine code format: mov Reg, Mem 1000101 woorrrmmm
Byte operation, W = 0
We know oo = 00 by the nature of the operands (who can give more detailed reasons? Thank you)
Al = 000, so RRR = 000
[BX] Actually DS: [BX], and DS: [BX] = 111
Woorrrmmm= 000000111 = 07
So 1000101 woorrrmmm = 8a07
MoV Al, [BX] => 8a07
-----------------------------------
The above commands are converted from assembly to machine commands. It is not difficult to understand the coding rules as long as you have a script table.
The 8086 instruction type is variable-length instruction, which makes it very difficult to identify the instruction. It is much more troublesome to disassemble the instruction from the machine instruction to the assembly instruction.
Do you have any good algorithms? Can the latter provide a URL?