Recently write C code very uncomfortable, feel a lot of places very redundant, hoping to improve, and then born this idea, using the knowledge of the front end of the compilation theory, build a new language compiler, and then translate the new language into a standard C89.
In this way, to ensure the availability, but also greatly reduce the workload of the back end, why not?
I explain the reasons for choosing C89, first of all, C language expansion is not much, mainly C + + and objective-c. Compiled to the target language platform, if it is C + +, too heavy, difficult to do operating system-level programming, but if you want to make the compiler design is very lightweight, you should try to avoid heavy language.
Java is also a good choice, but its platform has a lot of good design language such as Scala, and then continue to develop a new version of the language is not so urgent, so my first phase of the goal, selected to compile to C89.
Goal
Create a customizable compiler that translates the user-defined language into C89 as a rule.
Why is it possible to customize it? In fact, I want the user to be able to customize the programming language, the language itself is the way to describe the logic, if the user understand the compilation technology, understand how to define a language, then we should support his syntax changes, and then his grammar definition file as a script, and saved in the project.
After compiling to the target C code, you can also interact with existing libraries, thus increasing reusability.
Composition of the compilation system
The general compiling system is composed of the following parts:
A minimal compiler, at least, should consist of a lexical parser, a parser, and a code generator.
If we do not want to build the symbol table, then we have to abandon the semantic check, but I would like to build a stack symbol table, if possible, because after all, the processing of the symbol is very important.
The exception handling section may also be short, with no intermediate code, and the Code optimization section omitted, allowing us to build the available compilers as quickly as possible.
Introduction to the implementation of lexical analysis program
First, we want to write a lexical analysis program, we want different users to define different lexical analysis programs, then we can not take the lex code generation technology to generate C's lexical analyzer, we need to write an analysis program.
To support regular expressions, we build multiple DFA from each regular, then combine multiple DFA to form our lexical analysis program, and then we use a greedy idea to get the longest match possible.
In the following chapters, we will learn in detail how a regular engine is built and how it is applied to the compilation system.
The basic explanation of BNF paradigm
Below we will learn to describe a language, we use the extended BNF paradigm to describe it, and if you do not yet know what the BNF paradigm is, you can refer to my previous article: "Compiling techniques, from BNF paradigm to grammar recognition"
Then we pass our description of the language to our parser, the parser analyzes our BNF paradigm, each BNF paradigm follows the semantic action, and the entire compilation can be achieved by executing the semantic action.
Introduction to the implementation of the grammar analysis program
We will use very fashionable LALR analysis of the grammar analysis, our grammar analysis program is a bottom- up analysis program, using the concept of the statute, continuous analysis of grammatical components, the construction of a syntax tree of the leftmost statute .
LALR analysis needs to first analyze the syntax definition file, here we will use the ready-made tool bison to build a basic parser, so that you can have a preliminary understanding of the grammar description, and then we will parse the good BNF paradigm, Used as the basic data for building compiler push-down automata.
In this section, we will introduce the concepts of terms, itemsets, automata, etc. in LR analysis, and complete a LALR parser.
For the sake of brevity, our Language Action section embeds a LUA script that uses the interaction of C and Lua to complete most of the code generation and translation work, and basically completes the entire compiler.
GitHub Project Warehouse
Here I will separate the parser and parser to write, is the lexical analyzer Lex, parser lr_parser, this time I will continue to revise and improve two projects, please look forward to.
Create a new language (1)--Identify the architecture