? Conmajia & Icemanind 2012
This article compiles a series of articles based on how-to- Create Your Own Virtual machine, and has been extensively modified (with the consent of the author).
READ: In the previous article, next
Download: Source code, English course (PDF)
Preface
by Conmajia
You guys, this series of articles you're reading will start from scratch and take you step-up and design and implement a fully operational virtual machine . We are going to use the C # language, based on the Microsoft. NET Framework 2.0 Runtime to complete the production of the entire virtual machine (for compatibility reasons, but also to focus primarily on design). Therefore, you need to have the most basic. NET program development knowledge. In other words, you should at least use Visual Studio 2005 (or later) and successfully run your own "hello world" program.
Before starting the design, let's take a look at the knowledge of the virtual machine.
A virtual machine is a middleware (middleware) that simulates the hardware environment and is a highly isolated software container that can run its own operating system and applications as if it were a physical computer. A virtual machine behaves exactly like a physical computer, which contains its own virtual (that is, software-based) CPUs, and some even extend virtual hardware such as RAM, hard disks, and network interface cards (NICs).
The operating system cannot tell the difference between a virtual machine and a physical machine, and the application and other computers in the network cannot be distinguished. Even the virtual machine itself thinks of itself as a "real" computer. However, the virtual machine is completely composed of virtual machine software and does not contain any hardware components. As a result, virtual machines have many unique advantages that physical hardware does not have.
Advantages of virtual machines
Typically, virtual machines have the following four key features:
1. Compatibility: Virtual machines are compatible with all standard x86 computers
2. Isolation: Virtual machines are isolated from one another, as if physically separate
3. Encapsulation: Virtual machines encapsulate the entire computing environment
4. Hardware Independent: Virtual machines run independently of the underlying hardware
Well, let's start designing our own virtual machines.
Designing virtual Machines
We're going to draw a blueprint for this virtual machine. We named the virtual machine as:SunnyApril
(abbreviationSA)。 To simplify the design, the SA is designed as a 16-bit machine (which means that her CPU bit width is 16-bit). In this way, the address space that SA can support is0000H
-FFFFH
。 Now we are adding 5 registers to SA. Registers are an important concept and component of computer hardware. Registers are high-speed storage parts with limited storage capacity (typically 1, 2 bytes) that are used to hold instructions, data, or addresses. Built-in registers are included in almost all CPUs and virtual machines. Simply put, the register is the memory inside the "CPU".
For simplicity, we've only designed 5 registers, each of which isA
、B
、D
、X
AndY
。A
、B
Registers are 8-bit registers that can be saved0
-FFH
The unsigned number or80H
-7FH
The signed number.X
、Y
AndD
The registers are all 16-bit and can be saved0
-FFFFH
The unsigned number or8000H
-7FFFH
The signed number. Also for the sake of design simplicity, currently we only consider the case of unsigned numbers, the number of symbols will be in the later study of floating-point numbers together.
D
The register is a special 16-bit register. Its value is determined by theA
、B
The value of the register is combined,A
Saved theD
A high 8-bit value,B
A low 8-bit value is saved. For exampleA
The register value is3CH
,B
The register value is10H
, youD
The register value is3C10H
。 Conversely, if you modify the D register value to07C0H
, youA
The register value becomes07H
,B
The register value becomesC0H
。
The following figure illustrates the relationship between the specifications of the registers and the relationships between them.
In order for our virtual function to "feed back" the results in the first time, we set aside 4000 bytes of space ( A000H
- AFA0H
) for the "display" cache from the 64KB memory space. We imitate the assembly language under DOS, with 2000 bytes to hold the display character (so that you can get the 80x25 character screen), and 2000 bytes to hold the style of each character. The low 3 bits per style byte represent the red, green, and blue color values of the foreground color, the 4th bit is the shading, and the 5-7 bits are the same for the background color. The highest bit of the style byte would have been to indicate whether the character was blinking, but it was not needed in our design, so we ignored it directly.
The next task is to design the set of instructions (that is, bytecode) that will allow the virtual machine to run. Instruction set and our self-made "assembly language" design, simplicity, first design 4 instructions.
Take the LDA
instruction (bytecode 01H
) as an example, which registers the operand ( #41H
) into the A
register, which is the "load a". Because the operand is addressed in too many ways, it simply begins with the #
sign, which means "immediate count" (imitating the assembly language of the 51 microcontroller). H
numbers ending with "" are represented as 16, similar to " O
" (octal), "" B
(binary), and " D
" (decimal, can be omitted).
END
The instruction (bytecode 04H
) indicates the end of the program. At the same time, the "label" behind it represents the start tag of the program, which is used to mark where the program runs. A label is a string that begins with a separate row of letters ending with a ":" colon, such as the start label:
START:
The next step is to design the compiled bytecode file format. Most binary file formats start with a string of "magic number" strings. For example, the Dos/windows file begins with " MZ
", and the Java binary file starts with a 4-byte number 3405691582
, which means " CAFEBABE
" (Coffee baby) in 16 binary notation. Our SunnyApril
use of " CONMAJIA
" as magic numbers. The magic number is followed by the file body offset, which represents the starting position of the file body (that is, the program bytecode) in the file. Then the length of the program, that is, the file body length. The execution address represents the byte code execution start address, fixed to 0
. The offset segment (which may later change) is used to hold additional data or interrupt vector tables, such as "offset -13" bytes. " After the file header is the file body, save the program compiled all the bytecode. File structure see.
Assembler Series
Now we can start designing the assembler. This assembler will be able to compile the compiled source program that we have written and write it to a binary bytecode file that can be run by the virtual machine. The assembly file format is as follows:
<指令><空白><操作数>[空白]<换行>
Where the contents of the square brackets []
are optional.
Note: The following content and source code after a substantial transformation and optimization, and the original text difference is larger, pay attention to the difference.
This is our assembler source program:
#65#A000HSTA XEND START
The function of this program is simply to A
output the characters to the upper left corner of the screen. The first line of code defines the START
label. The second line will immediately count 65
(that is, ASCII code ' A ') into the A
register. The third line is immediately counted A000H
(that is, the start address of the cache is displayed, see the Design section) to deposit X
registers. The four-line code A
registers the value () in the 65
X
register with the number ( A000H
) represented by the memory address. Finally, the END
end program is used.
Below we run visual Studio, create a new "windows window Application project, select the. NET Framework version of 2.0, modeled after the following design form.
Where the textBox1.Readonly
property is set to true
, the numericUpDown1.Hexadecimal
property is set to true
.
First, create the following variables in the form class.
Dictionary<string, UInt16> labelDict;UInt16 binaryLength;UInt16 executionAddress;
Defines a register enumeration.
enumRegisters { 0, 4, 2, 1, 16, 8 }
Initializes variables and controls in the form's constructor.
publicForm1() { InitializeComponent(); new Dictionary<stringushort>(); 0; 0; 0x200; }
button1
is to open the File Browsing dialog box to select the source files that need to be assembled. Double-click button1
to enter the following code in the generated Click
event:
OpenFileDialog ofd = new OpenFileDialog ();Ofd. Filter="Sunnyapril Assembly Files (*.asm) |*.asm";Ofd. DefaultExt="ASM";Ofd. FileName= string. Empty;if (OFD. ShowDialog() = = System. Windows. Forms. DialogResult. OK) TextBox1. Text= OFD. FileName;else TextBox1. Clear();
button2
The function is to perform the assembly and generate the binary bytecode file, the main code is as follows:
if (textBox1. Text= = String. Empty) return;Labeldict. Clear();Binarylength = (UInt16) numericUpDown1. Value;FileInfo fi = new FileInfo (textBox1. Text);BinaryWriter output;FileStream fs = new FileStream (Path. Combine(FI. DirectoryName, fi. Name+". Sab"), FileMode. Create);Output = new BinaryWriter (FS);Magic word Output. Write(' C ');Output. Write(' O ');Output. Write(' N ');Output. Write(' M ');Output. Write(' A ');Output. Write(' J ');Output. Write(' I ');Output. Write(' A ');ORG output. Write((UInt16) numericUpDown1. Value);Scan to ORG andStart writing Byte-code output. Seek(int) NumericUpDown1. Value, SeekOrigin. Begin);Parse source code line-by-line TextReader input = File. OpenText(TextBox1. Text);String line;while (line = input. ReadLine()) = null) {Parse (line. ToUpper(), Output);Dealedsize + = line. Length;Invoker. Set(ProgressBar1,"Value", (int) ((float) dealedsize/(float) totalsize * -));} input. Close();Binary Length & execution address (7Magic-word,2ORG before) output. Seek(Ten, SeekOrigin. Begin);Output. Write(binarylength);Output. Write(executionaddress);Output. Close();Fs. Close();MessageBox. Show("done!");
In this method, through a while
row by line parsing source code (the original author is full-text parsing), the parsing method is as follows:
Private void Parse(stringLine, BinaryWriter output) {//Eat white spaces and commentsline = Cleanline (line);if(line. EndsWith (":"))//LabelLabeldict.add (line. TrimEnd (New Char[] {': '}), binarylength);Else{//CodeMatch m = Regex.match (line,@ "(\w+) \s (. +)");stringopcode = m.groups[1]. Value;stringoperand = m.groups[2]. Value;Switch(opcode) { Case "LDA": Output. Write ((byte)0x01); Output. Write (Getbytevalue (operand)); Binarylength + =2; Break; Case "LDX": Output. Write ((byte)0x02); Output. Write (Getwordvalue (operand)); Binarylength + =3; Break; Case "STA": Output. Write ((byte)0x03);//Note:no error handling.Registers r = (registers) Enum.parse (typeof(registers), operand); Output. Write ((byte) r); Binarylength + =2; Break; Case "END": Output. Write ((byte)0x04);if(Labeldict.containskey (operand)) {output. Write (Labeldict[operand]); Binarylength + =2; } Binarylength + =1; Break;default: Break; } } }
The internal method of reading the byte ( byte
) operand is used, as shown below. A little improvement can easily support multiple number-numbering. The Word
method of reading the word () operand is similar to this, and no further explanation is given.
Private byte Getbytevalue (string operand) {byteret=0;if (operand. StartsWith("#") {operand = operand. Remove(0,1);Char last = Operand[operand. Length-1];if (char. Isletter(last)) Switch (last) {case' H '://Hexret= Convert. ToByte(operand. Remove(operand. Length-1,1), -); Break;Case' O '://Octret= Convert. ToByte(operand. Remove(operand. Length-1,1),8); Break;Case' B '://binret= Convert. ToByte(operand. Remove(operand. Length-1,1),2); Break;Case' D ': //Dec ret= Convert. ToByte(operand. Remove(operand. Length-1,1),Ten); Break;} elseret= Byte. Parse(operand);} returnret;}
Run the assembler, assemble the previously saved demo1.asm
file, and get the demo1.sab
binary bytecode file (Springapril Binaries), which reads as follows:
Can see, the assembler faithfully completed the task we confessed, the correct calculation of the file size, at the beginning of the 0200H
location, the compilation of the bytecode " 01 00 02 00 00 03 10 04 00 02
", the following we control the source program to test. For easy observation, write the source program again.
#65#A000HSTA XEND START
The first behavior START
tag, which 0200H
caches the address (not reflected in the file).
The second line of LDA
instruction, which is stored in a byte code and 01H
then deposited into a single-byte operand ( A
The register is a 8-bit register) 65
, that is 41H
.
The third line of LDX
instructions, in bytes 02H
, and then into the double-byte operand ( X
Register is a 16-bit register) A000H
, because the computer uses a small terminal mode (low in front), so in the file is "" in the 00 A0
form of storage.
The four-line instruction, which is stored in the STA
byte code 03H
, is then stored in the Registers.X
enumeration value ( 16
that is 01H
).
The five-line END
instruction, which is stored in the bytecode and 04H
then deposited into the START
label address 0200H
(2 bytes, still in the small-end mode).
Based on the above analysis, we make the assembler fully conform to the design.
Next, we will start to design the virtual machine, so please look forward to it.
Various suggestions are welcome.
(End of the first part)
? Conmajia, Icemanind 2012
Homemade Virtual Machine Series Part I: ideation and assembler