Using C language to implement a simple virtual machine

Using C language to implement a simple virtual machine _c language

Last Update:2017-01-19 Source: Internet

Author: User

Tags case statement eval mkdir

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Necessary preparatory work and precautions:

You need to do the following before you start:

A C compiler-I use clang 3.4, and I can use other compilers that support C99/C11.
Text Editor-I recommend using an IDE based text editor, I use Emacs;
BASIC programming knowledge-the most fundamental variables, process control, function, data structure, etc.
Make script--makes the program a little faster.

Why do you want to write a virtual machine?

The following are the reasons:

Want to know more about how the computer works. This article will help you understand how the bottom of the computer works, virtual machines provide a concise layer of abstraction, this is not the best way to learn their principles?
Learn more about how some programming languages work. For example, there are many virtual machines that often use those languages today. Including Jvm,lua Vm,facebook hip-hop VM (Php/hack) and so on.
Just because you are interested in learning virtual machines.

Instruction Set

We're going to implement a very simple custom instruction set. I do not speak some advanced such as displacement registers, etc., I hope that after reading this article to master these.

Our virtual machines have a set of registers, A,b,c,d,e, and F. These are universal registers, which means they can be used to store anything. A program will be a read-only instruction sequence. This virtual machine is a stack based virtual machine, which means that it has a stack that allows us to push and eject values, as well as a small number of registers available. This is much simpler than implementing a virtual machine based on registers.

And here's the set of instructions we're going to implement:

PSH 5    pushes 5 to the stack
PSH   , pushes to the stack
ADD     ; pops two values on top of the Stac  K, adds them pushes to stack
pops     ; pops the value on the stack, would also print it for debugging
SET A 0   ; Sets register A to 0
hlt     , Stop the program

This is our instruction set, note that the pop command will print our pop-up instructions so that we can see the ADD instruction work. I also added a SET directive, which basically allows you to understand that registers are accessible and writable. You can also implement instructions like mov a B, which moves the value of a to B. The HTL instruction is to tell us that the program has run out.

How does a virtual machine work?

Now that we have come to the most critical part of this article, the virtual machines are simpler than you think, and they follow a simple pattern: reading; decoding; executing. First, we read the next instruction from the instruction set or code and then decode the instruction and execute the decoded instruction. For simplicity, we've ignored the coding part of the virtual machine, and the typical virtual machine will package an instruction (opcode and its operands) into a number and then decode the instruction.
Project Structure

Before we start programming, we need to set up our project. First, you need a C compiler (I use clang 3.4). Also need a folder to place our items, I like to place my items in ~/dev:

$CD ~/dev/
mkdir mac
cd mac
mkdir src

As above, let's go to the ~/dev directory, or wherever you want to put it, and create a new directory (I call this virtual machine "Mac"). Then go to the directory and create our SRC directory, which is used to place the code.

Makefile

Makefile relatively straightforward, we don't need to divide anything into multiple files or anything, so we just need to use a few flags to compile the file:

Src_files = main.c
cc_flags =-wall-wextra-g-std=c11
CC = clang all: ${cc} ${src_files}
 
${cc_flags}-
  O Mac

This is enough for now, you can improve it later, but as long as it can complete the work, we should be satisfied.
instruction Programming (code)

Now start writing the code for the virtual machine. First, we need to define the instructions for the program. To do this, we can use an enum type enum, because our instructions are essentially numbers from 0 to X. In fact, you can say that you are assembling an assembly file that uses words like MOV and translates them into declarative instructions.
We can write only one instruction file, such as PSH, 5 is 0, 5, but this is not easy to read, so we use the enumerator!

typedef enum {
  PSH,
  ADD,
  POP,
  SET,
  hlt
} instructionset;

Now we can save a test program as an array. We write a simple program for testing: Add 5 and 6 together, and then print them out (with pop instructions). If you want, you can define a command to print the value of the top of the stack.

The instruction should be stored as an array, and I will define it at the top of the document, but you might put it in a header file, and here is our test program:

const int program[] = {
  PSH, 5,
  PSH, 6,
  ADD,
  POP,
  hlt
};

The above program will push 5 and 6 into the stack, call the ADD instruction, which will pop up the top two values, add the results back to the stack, and then we pop the result because the pop command will print this value, but you don't have to do it yourself, I've done it and tested it. Finally, the HLT instruction ends the program.

Good, so we have our own program. Now we have realized the mode of reading, decoding and evaluation of the virtual machine. But remember, we don't decode anything, because we're giving the original instructions. That means we just need to focus on reading and evaluation! We can simplify them to two functions fetch and evaluate.

Get current Instruction

Because we have stored our program as an array, it is easy to get the current instruction. A virtual machine has a counter, generally called program counter, instruction pointer and so on, these names are a meaning depending on your personal preference. In the virtual machine's code base, the short form of IP or PC can also be seen everywhere.

If you remember before, I said we want to store the program counters in the form of registers ... We'll do that--in the future. Now, we're just going to create a variable called IP at the top of our code and set it to 0.

int ip = 0;

The IP variable represents the instruction pointer. Because we have stored the program as an array, we use an IP variable to indicate the current index in the program array. For example, if you create a variable x that is assigned a program's IP index, it stores the first instruction of our program.

[Assuming IP is 0]

int ip = 0;
 
int main () {
  int instr = Program[ip];
  return 0;

If we print the variable instr, it would have been PSH, it would have shown 0, because he is the first value in our enumeration. We can also write a retrieval function like this:

int fetch () {return
  program[ip];
}

This function will return the currently invoked instruction. Great, so what if we want the next instruction? It's easy, we just need to add the instruction pointer:

int main () {
  int x = Fetch ();//PSH
  ip++;//increment instruction pointer
  int y = fetch ();//5
}

So how do you make it move? We know that a program will not stop until it executes the hlt instruction. So we use an infinite loop to continue until the current instruction is HLT.

INCLUDE <stdbool.h>!
BOOL running = true;
 
int main () {while
  (running) {
    int x = Fetch ();
    if (x = = HLT) running = false;
    ip++
  }
}

It's a good job, but it's a bit messy. We're looping through every instruction, checking if it's hlt, and stopping the loop if it is, otherwise the "eat" instruction then loops.

Judge an instruction

So this is the body of our virtual machine, but we want to really judge each instruction and make it simpler. OK, this simple virtual machine, you can write a "huge" switch statement. Let each case in the switch correspond to an instruction that we define in the enumeration. This eval function will be judged using the parameters of a simple instruction. We will not use any instruction pointer increments in the function unless we want the operand to be a waste of operands.

void eval (int instr) {
  switch (instr) {case
    hlt:
      running = false;
      break;
  }
}

So if we're going back to the main function, we can work with our eval function like this:

BOOL running = true;
int ip = 0;
 
instruction enum here
 
//eval function here
 
//fetch function here
 
int main () {while
  (running) {
    E Val (Fetch ());
    ip++; Increment the IP every iteration
  }
}

Stack!

Well, that would be perfect for the job. Now, before we add the other instructions, we need a stack. Luckily, stacks are easy to implement and we just need to use an array. The array is set to the appropriate size so that it can contain 256 values. We also need a stack pointer (often abbreviated as SP). This pointer will point to the stack array.

To give us a more visual impression of it, let's take a look at this stack that is implemented using arrays:

[]//Empty
 
PSH 5//Put 5 on **top** of the stack
[5]
 
PSH 6
[5, 6]
 
pop [
5]
 
pop
[] Empty
 
PSH 6
[6]
 
PSH 5
[6, 5]

So, what happened in our program?

PSH, 5,
PSH, 6,
ADD,
POP,
hlt

We first pushed 5 into the stack.

[5]

Then press into 6:

[5, 6]

Then add the instructions, take out the values, add them together and push the results onto the stack:

[5, 6]
 
Pop the top value, store it in a variable called a
= Pop;//A contains 6
[5]//Stack contents
 
//Pop th  E top value, store it in a variable called b
= pops;//b contains 5
[]//Stack contents
 
//Now we add B and A. We do it backwards, in addition
//This doesn ' t matter, but in the other potential instructions
/for Inst Ance Divide 5/6 is isn't the same as 6/5 result
= B + A;
Push result//Push the result to the stack
[one]//stack contents

So where does our stack pointer work? The stack pointer (or SP) is typically set to-1, which means that the pointer is empty. Keep in mind that an array starts at 0, and if the value of the SP is not initialized, then he is set to a random value that the C compiler puts there.

If we press the stack for 3 values, the SP will become 2. So this array holds three values:

SP point here (sp = 2)
|
V
[1, 5, 9]
0 1 2 <-array subscript

Now we're going to stack it out of the stack once, and we just need to reduce the top pointer on the stack. For example, we next put 9 out of the stack, then the top of the stack will become 5:

SP point here (sp = 1)
|
V
[1, 5]
0 1 <-Array subscript

So, when we want to know what's on the top of the stack, we just need to see the current value of the SP. OK, you might want to know how stacks work, and now we're implementing it in C language. Very simply, as with IP, we should also define an SP variable, and remember to assign it to-1! Then define an array called stack with the following code:

int ip = 0;
int sp =-1;
int stack[256]; Use an array or other C code that fits here
 
/

Now if we want to stack a value, we'll add the top pointer to the stack and set the value of the current SP (we just added). Note: The order of these two steps is very important!

Pressure stack 5
 
//SP =-1
sp++//SP = 0
STACK[SP] = 5;//top of stack now becomes 5

So, in our Executive function eval (), you can implement the push stack instruction like this:

void eval (int instr) {
  switch (instr) {case
    hlt: {
      running = false;
      break;
    Case PSH: {
      sp++;
      STACK[SP] = Program[++ip];
      break;
    }}}

Now you see that it's somewhat different from the eval () function we implemented earlier. First, we put each case statement block inside the curly braces. You may not know this usage very well, it allows you to define variables in the scope of each case. Although you do not need to define a variable now, you will use it in the future. And it can be easy to keep all the case statements in a consistent style.

The second is the magical expression Program[++ip]. What has it done? Well, our program is stored in an array, and the PSH instruction needs to get an operand. The essence of an operand is a parameter, like when you call a function, you can pass it a parameter. This is what we call a pressure stack value of 5. We can get the operands by adding an instruction pointer to the IP, which is also known as a program counter. When IP is 0 o'clock, which means executing to the PSH instruction, next we want to get the next instruction-the value of the stack. This can be achieved through an IP-generated approach (note: It is important to increase the location of the IP, we want to be PSH before we get the instructions, otherwise we just get the instructions), then we need to skip to the next instruction or we will cause a strange mistake. Of course we can also simplify the sp++ to STACK[++SP].

For pop directives, the implementation is very simple. Only need to reduce the top of the stack pointer, but I usually want to be able to print out the stack when the stack value.

I omitted the code and Swtich statements that implement other directives, listing only the implementations of the pop directives:

Remember #include <stdio.h>!
 
Case POP: {
  int val_popped = stack[sp--];
  printf ("Popped%d\n", val_popped);
  break;

Now, pop instructions are working! What we just did was put the top of the stack in the variable val_popped, and then the top pointer in the stack was minus one. If we start with a stack top minus one, then we get some invalid values, because the SP may take a value of 0, then we might assign stack[-1 to val_popped, which is usually not a good idea.

Finally, add instruction. This instruction may cost you some brain cells, and this is why we need to implement the scope within the case statement with curly braces {}.

Case ADD: {
  //First we stack, put the value into variable a
  int a = stack[sp--];
 
  Then we stack the value into the variable b
 
  //Next two variables add, and then put the result into the stack
  int = a + b;
  sp++; Stack Top plus 1 * * before assignment * *
  stack[sp] = result;//set stack top value/
 
  /finish! Break
  ;

Register

Register is a virtual machine in the selection of accessories, it is easy to achieve. Previously mentioned we might need six registers: A,b,c,d,e and F. As with the implementation instruction set, we also use an enumeration to implement them.

typedef enum {
  A, B, C, D, E, F,
  num_of_registers
} registers;

Tip: The end of the enumeration places a number of num_of_registers. This number allows you to get the number of registers, even if you add additional registers. Now we need an array to store the values for the registers:

int registers[num_of_registers];

Next you can read the values in the Register:

printf ("%d\n", Registers[a]); Print the value of register a

Amendment

I don't spend too much time on registers, but you should be able to write instructions for manipulating registers. For example, if you want to implement any branching jumps, you can either save the instruction pointer (translator NOTE: or call the program counter) and/or the top of the stack pointer to a register, or implement a branching instruction.

The former is relatively fast and simple to achieve. We can do this by increasing the number of registers representing the IP and SP:

typedef enum {
  A, B, C, D, E, F, PC, SP,
  num_of_registers
} registers;

Now we need to implement code to use the instruction pointer and the stack top pointer. An easy way to do this is to delete the SP and IP variables defined above and implement them with a macro definition:

#define SP (REGISTERS[SP])
#define IP (REGISTERS[IP])

Translator Note: This should be consistent with the registers enumeration, IP should be changed to PC

This modification is just right, you don't need to rewrite a lot of code, and it works very well.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More