A c language compiler for programming and Practice
Compiling a compiler and learning the underlying programming method is a very effective way to learn how computers work.
Compilers are often seen as very complex projects. In fact, writing a product-level compiler is indeed a huge task. But it is not that difficult to write a small and available compiler.
The secret is to first find a minimum available project and then add the desired features. This method is also mentioned in Abdulaziz Ghuloum's famous paper "a shortcut to build a compiler. However, this method is feasible. You only need to follow the first step in this article to get a truly available compiler! Of course, it can only compile a very small subset of the program language, but it is indeed a real and available compiler. You can expand the compiler at will and learn more and deeper knowledge from it.
Inspired by this article, I wrote a C compiler. In a sense, this is more difficult than writing a scheme Compiler (because you have to parse the Complex C syntax ), but in some ways it is very convenient (you do not need to process the runtime type ). To write such a compiler, you only need to start with the smallest available compiler.
For the compiler I wrote, I called it babyc. I chose this code as the first program I needed to run:
int main() { return 2;}
No variables, no function calls, no additional dependencies, or even if statements or loop statements, everything looks so simple.
We need to parse this code first. We will use Flex and Bison to do this. Here are examples of how to use it for reference. Fortunately, our syntax is so simple. below is the lexical analyzer:
{ { return '{'; }} { return '}'; }( { return '('; }) { return ')'; }; { return ';'; }[0-9]+ { return NUMBER; }return { return RETURN; }int { return TYPE; }main { return IDENTIFIER; }
Here is the syntax analyzer:
function:TYPE IDENTIFIER '(' ')' '{' expression '}'; expression:RETURN NUMBER ';';
Finally, we need to generate some assembly code. We will use a 32-bit X86 Assembly because it is very common and can run easily on your machine. There are websites related to X86 assembly.
The following is the compilation code we need to generate:
.text .global _start # Tell the loader we want to start at _start._start: movl $2,%ebx # The argument to our system call. movl $1,%eax # The system call number of sys_exit is 1. int $0x80 # Send an interrupt
Add the above lexical syntax analysis code and write the assembly code into a file. Congratulations! You are already a compiler writer!
This is how Babyc was born. You can see its initial appearance here.
Of course, it would be a waste if the Assembly Code cannot be run. Let's use the compiler to generate the real assembly code we want.
# Here's the file we want to compile.$ cat return_two.c#include
int main() { return 2;}# Run the compiler with this file.$ ./babyc return_two.cWritten out.s.# Check the output looks sensible.$ cat out.s.text .global _start_start: movl $2, %ebx movl $1, %eax int $0x80
Great! Next, let's really run the compiled code to ensure that it can get what we think.
# Assemble the file. We explicitly assemble as 32-bit# to avoid confusion on x86_64 machines.$ as out.s -o out.o --32# Link the file, again specifying 32-bit.$ ld -m elf_i386 -s -o out out.o# Run it!$ ./out# What was the return code?$ echo $?2 # Woohoo!
We took the first step, and the next step will show you how to do it. You can follow all the instructions in that article to create a more complex compiler. You need to write a more sophisticated syntax tree to generate assembly code. The next steps are as follows: (1) Allow to return arbitrary values (for example, return 3; some executable code); (2) Add support for "Non" (for example, return ~ 1; some executable code ). Every extra feature can teach you more about the C language, how the compiler is executed, and how others in the world think about it.
This is the method for building babyc. Babyc now has the if statement, loop, variable, and basic data structure. You are welcome to check out its code, but I hope you can write it yourself after reading my article.
Do not be afraid of the underlying things. This is a wonderful world.