Homemade Virtual Machine Series Part I: Ideation and Assembler

Source: Internet
Author: User

? Conmajia & Icemanind 2012

This article compiles a series of articles based on how-to- Create Your Own Virtual machine, and has been extensively modified (with the consent of the author).

READ: In the previous article, next
Download: Source code, English course (PDF)

Preface

by Conmajia

You guys, this series of articles you're reading will start from scratch and take you step-up and design and implement a fully operational virtual machine . We are going to use the C # language, based on the Microsoft. NET Framework 2.0 Runtime to complete the production of the entire virtual machine (for compatibility reasons, but also to focus primarily on design). Therefore, you need to have the most basic. NET program development knowledge. In other words, you should at least use Visual Studio 2005 (or later) and successfully run your own "hello world" program.
Before starting the design, let's take a look at the knowledge of the virtual machine.
A virtual machine is a middleware (middleware) that simulates the hardware environment and is a highly isolated software container that can run its own operating system and applications as if it were a physical computer. A virtual machine behaves exactly like a physical computer, which contains its own virtual (that is, software-based) CPUs, and some even extend virtual hardware such as RAM, hard disks, and network interface cards (NICs).
The operating system cannot tell the difference between a virtual machine and a physical machine, and the application and other computers in the network cannot be distinguished. Even the virtual machine itself thinks of itself as a "real" computer. However, the virtual machine is completely composed of virtual machine software and does not contain any hardware components. As a result, virtual machines have many unique advantages that physical hardware does not have.

Advantages of virtual machines

Typically, virtual machines have the following four key features:
1. Compatibility: Virtual machines are compatible with all standard x86 computers
2. Isolation: Virtual machines are isolated from one another, as if physically separate
3. Encapsulation: Virtual machines encapsulate the entire computing environment
4. Hardware Independent: Virtual machines run independently of the underlying hardware

Well, let's start designing our own virtual machines.

Designing virtual Machines

We're going to draw a blueprint for this virtual machine. We named the virtual machine as:SunnyApril(abbreviationSA)。 To simplify the design, the SA is designed as a 16-bit machine (which means that her CPU bit width is 16-bit). In this way, the address space that SA can support is0000H-FFFFH。 Now we are adding 5 registers to SA. Registers are an important concept and component of computer hardware. Registers are high-speed storage parts with limited storage capacity (typically 1, 2 bytes) that are used to hold instructions, data, or addresses. Built-in registers are included in almost all CPUs and virtual machines. Simply put, the register is the memory inside the "CPU".
For simplicity, we've only designed 5 registers, each of which isABDXAndYABRegisters are 8-bit registers that can be saved0-FFHThe unsigned number or80H-7FHThe signed number.XYAndDThe registers are all 16-bit and can be saved0-FFFFHThe unsigned number or8000H-7FFFHThe signed number. Also for the sake of design simplicity, currently we only consider the case of unsigned numbers, the number of symbols will be in the later study of floating-point numbers together.
DThe register is a special 16-bit register. Its value is determined by theABThe value of the register is combined,ASaved theDA high 8-bit value,BA low 8-bit value is saved. For exampleAThe register value is3CHBThe register value is10H, youDThe register value is3C10H。 Conversely, if you modify the D register value to07C0H, youAThe register value becomes07HBThe register value becomesC0H
The following figure illustrates the relationship between the specifications of the registers and the relationships between them.

In order for our virtual function to "feed back" the results in the first time, we set aside 4000 bytes of space ( A000H - AFA0H ) for the "display" cache from the 64KB memory space. We imitate the assembly language under DOS, with 2000 bytes to hold the display character (so that you can get the 80x25 character screen), and 2000 bytes to hold the style of each character. The low 3 bits per style byte represent the red, green, and blue color values of the foreground color, the 4th bit is the shading, and the 5-7 bits are the same for the background color. The highest bit of the style byte would have been to indicate whether the character was blinking, but it was not needed in our design, so we ignored it directly.
The next task is to design the set of instructions (that is, bytecode) that will allow the virtual machine to run. Instruction set and our self-made "assembly language" design, simplicity, first design 4 instructions.

Take the LDA instruction (bytecode 01H ) as an example, which registers the operand ( #41H ) into the A register, which is the "load a". Because the operand is addressed in too many ways, it simply begins with the # sign, which means "immediate count" (imitating the assembly language of the 51 microcontroller). Hnumbers ending with "" are represented as 16, similar to " O " (octal), "" B (binary), and " D " (decimal, can be omitted).
ENDThe instruction (bytecode 04H ) indicates the end of the program. At the same time, the "label" behind it represents the start tag of the program, which is used to mark where the program runs. A label is a string that begins with a separate row of letters ending with a ":" colon, such as the start label:

START:

The next step is to design the compiled bytecode file format. Most binary file formats start with a string of "magic number" strings. For example, the Dos/windows file begins with " MZ ", and the Java binary file starts with a 4-byte number 3405691582 , which means " CAFEBABE " (Coffee baby) in 16 binary notation. Our SunnyApril use of " CONMAJIA " as magic numbers. The magic number is followed by the file body offset, which represents the starting position of the file body (that is, the program bytecode) in the file. Then the length of the program, that is, the file body length. The execution address represents the byte code execution start address, fixed to 0 . The offset segment (which may later change) is used to hold additional data or interrupt vector tables, such as "offset -13" bytes. " After the file header is the file body, save the program compiled all the bytecode. File structure see.

Assembler Series

Now we can start designing the assembler. This assembler will be able to compile the compiled source program that we have written and write it to a binary bytecode file that can be run by the virtual machine. The assembly file format is as follows:

<指令><空白><操作数>[空白]<换行>

Where the contents of the square brackets [] are optional.

Note: The following content and source code after a substantial transformation and optimization, and the original text difference is larger, pay attention to the difference.

This is our assembler source program:

#65#A000HSTA XEND START

The function of this program is simply to A output the characters to the upper left corner of the screen. The first line of code defines the START label. The second line will immediately count 65 (that is, ASCII code ' A ') into the A register. The third line is immediately counted A000H (that is, the start address of the cache is displayed, see the Design section) to deposit X registers. The four-line code A registers the value () in the 65 X register with the number ( A000H ) represented by the memory address. Finally, the END end program is used.
Below we run visual Studio, create a new "windows window Application project, select the. NET Framework version of 2.0, modeled after the following design form.

Where the textBox1.Readonly property is set to true , the numericUpDown1.Hexadecimal property is set to true .
First, create the following variables in the form class.

Dictionary<string, UInt16> labelDict;UInt16 binaryLength;UInt16 executionAddress;

Defines a register enumeration.

enumRegisters {     0,     4,     2,     1,     16,     8 }

Initializes variables and controls in the form's constructor.

publicForm1() {     InitializeComponent();     new Dictionary<stringushort>();     0;     0;     0x200; }

button1is to open the File Browsing dialog box to select the source files that need to be assembled. Double-click button1 to enter the following code in the generated Click event:

OpenFileDialog ofd = new OpenFileDialog ();Ofd. Filter="Sunnyapril Assembly Files (*.asm) |*.asm";Ofd. DefaultExt="ASM";Ofd. FileName= string. Empty;if (OFD. ShowDialog() = = System. Windows. Forms. DialogResult. OK) TextBox1. Text= OFD. FileName;else TextBox1. Clear();

button2The function is to perform the assembly and generate the binary bytecode file, the main code is as follows:

if (textBox1. Text= = String. Empty) return;Labeldict. Clear();Binarylength = (UInt16) numericUpDown1. Value;FileInfo fi = new FileInfo (textBox1. Text);BinaryWriter output;FileStream fs = new FileStream (Path. Combine(FI. DirectoryName, fi. Name+". Sab"), FileMode. Create);Output = new BinaryWriter (FS);Magic word Output. Write(' C ');Output. Write(' O ');Output. Write(' N ');Output. Write(' M ');Output. Write(' A ');Output. Write(' J ');Output. Write(' I ');Output. Write(' A ');ORG output. Write((UInt16) numericUpDown1. Value);Scan to ORG andStart writing Byte-code output. Seek(int) NumericUpDown1. Value, SeekOrigin. Begin);Parse source code line-by-line TextReader input = File. OpenText(TextBox1. Text);String line;while (line = input. ReadLine()) = null) {Parse (line. ToUpper(), Output);Dealedsize + = line. Length;Invoker. Set(ProgressBar1,"Value", (int) ((float) dealedsize/(float) totalsize * -));} input. Close();Binary Length & execution address (7Magic-word,2ORG before) output. Seek(Ten, SeekOrigin. Begin);Output. Write(binarylength);Output. Write(executionaddress);Output. Close();Fs. Close();MessageBox. Show("done!");

In this method, through a while row by line parsing source code (the original author is full-text parsing), the parsing method is as follows:

Private void Parse(stringLine, BinaryWriter output) {//Eat white spaces and commentsline = Cleanline (line);if(line. EndsWith (":"))//LabelLabeldict.add (line. TrimEnd (New Char[] {': '}), binarylength);Else{//CodeMatch m = Regex.match (line,@ "(\w+) \s (. +)");stringopcode = m.groups[1]. Value;stringoperand = m.groups[2]. Value;Switch(opcode) { Case "LDA": Output. Write ((byte)0x01); Output.                 Write (Getbytevalue (operand)); Binarylength + =2; Break; Case "LDX": Output. Write ((byte)0x02); Output.                 Write (Getwordvalue (operand)); Binarylength + =3; Break; Case "STA": Output. Write ((byte)0x03);//Note:no error handling.Registers r = (registers) Enum.parse (typeof(registers), operand); Output. Write ((byte) r); Binarylength + =2; Break; Case "END": Output. Write ((byte)0x04);if(Labeldict.containskey (operand)) {output.                     Write (Labeldict[operand]); Binarylength + =2; } Binarylength + =1; Break;default: Break; }     } }

The internal method of reading the byte ( byte ) operand is used, as shown below. A little improvement can easily support multiple number-numbering. The Word method of reading the word () operand is similar to this, and no further explanation is given.

Private byte Getbytevalue (string operand) {byteret=0;if (operand. StartsWith("#") {operand = operand. Remove(0,1);Char last = Operand[operand. Length-1];if (char. Isletter(last)) Switch (last) {case' H '://Hexret= Convert. ToByte(operand. Remove(operand. Length-1,1), -);                      Break;Case' O '://Octret= Convert. ToByte(operand. Remove(operand. Length-1,1),8);                      Break;Case' B '://binret= Convert. ToByte(operand. Remove(operand. Length-1,1),2);                      Break;Case' D ':                     //Dec                     ret= Convert. ToByte(operand. Remove(operand. Length-1,1),Ten);                      Break;} elseret= Byte. Parse(operand);} returnret;}

Run the assembler, assemble the previously saved demo1.asm file, and get the demo1.sab binary bytecode file (Springapril Binaries), which reads as follows:

Can see, the assembler faithfully completed the task we confessed, the correct calculation of the file size, at the beginning of the 0200H location, the compilation of the bytecode " 01 00 02 00 00 03 10 04 00 02 ", the following we control the source program to test. For easy observation, write the source program again.

#65#A000HSTA XEND START

The first behavior START tag, which 0200H caches the address (not reflected in the file).
The second line of LDA instruction, which is stored in a byte code and 01H then deposited into a single-byte operand ( A The register is a 8-bit register) 65 , that is 41H .
The third line of LDX instructions, in bytes 02H , and then into the double-byte operand ( X Register is a 16-bit register) A000H , because the computer uses a small terminal mode (low in front), so in the file is "" in the 00 A0 form of storage.
The four-line instruction, which is stored in the STA byte code 03H , is then stored in the Registers.X enumeration value ( 16 that is 01H ).
The five-line END instruction, which is stored in the bytecode and 04H then deposited into the START label address 0200H (2 bytes, still in the small-end mode).
Based on the above analysis, we make the assembler fully conform to the design.
Next, we will start to design the virtual machine, so please look forward to it.
Various suggestions are welcome.

(End of the first part)

? Conmajia, Icemanind 2012

Homemade Virtual Machine Series Part I: ideation and assembler

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.