Self-developed Compiler (1) Modular compiler Engineering

Source: Internet
Author: User

Http://www.cnblogs.com/Ninputer/archive/2011/06/07/2074632.html

In the first article of this series, I would like to give an overview of the Compiler Construction and help you understand the purpose of each component in the compiler. I think you should read other compilation principles books. Most of them are in the first chapter or in the preface and so on. The compiler is divided into many modules, and each module is responsible for the specific stage of compilation. Finally, it is written together to form a complete compiler. For example, the following figure shows the compiler phase in chapter 1 of modern compiler by Andrew W. Appel:

So, why should we split the compiler into phases and modules? The answer is to make it easier to design and understand. How can a compiler be considered a big project? If it is not broken down, it will be very difficult to compile and maintain it. The clearer the module division of the compiler, the simpler the work. For example, in the lexical analysis phase, converting the input token stream into a word stream greatly reduces the input types to be judged during the syntax analysis phase, simplified design also helps improve performance. In addition, modularization also isolates the work of the compiler in every stage as much as possible. For example, the compiler can perform optimizations unrelated to the Specific CPU, or perform specific optimizations for a specific CPU. They can all be performed independently without re-designing the entire system.

 

The fact may be surprising: how to design each stage and module of the compiler, or evenProgramming Language. For example, in the early programming language Fortran, people have not mastered the theory of so many compilation principles at the beginning of design, its Syntax cannot be clearly divided into lexical analysis and syntax analysis stages like today's language. Because the syntax of FORTRAN does not contain the lexical structure that can be independently processed by an automatic machine. Therefore, the Fortran compiler is complex in syntax analysis. Languages with some historical backgrounds may also have this complicated syntax. For example, Visual Basic cannot be used as an independent lexical analyzer to scan a language. Therefore, the syntax analyzer of VB is much more difficult to write than new languages such as C. Another example is that early Pascal and some C languages allow specific syntaxes to specify a variable as a register variable (which may still exist in recent Delphi for verification ). This is because there was no very effective register allocation at the time.Algorithm, RequiredProgramMembers decide based on their own experience. Today, if a language allows explicitly specifying a variable as a register variable, it will interfere with the design of the register allocation module. To sum up, I would like to give you some suggestions for future compiler designers. Designing Your syntax can greatly simplify the design of the compiler!

 

In addition to simplified design, modularization of each stage of the compiler has greater value. We thought that the compiler only needs to compile the source file into the final goal.CodeThat's all. But with the emergence of a variety of development tools-editor, Automatic completion, commissioning, refactoring tools, test coverage detection, performance profiling ...... It is found that the results produced by each stage in the compiler compilation process may be very valuable. Exposing the internal structure and intermediate results of the compiler to users is an inevitable trend. For example, the compiler as a service feature provided by Visual Studio in the next generation is to expose the internal modules of the compiler to users as a service. Let me give you a few examples to see the possible use of the output of the compiler module:

 

compiler stage

result

purpose

lexical analysis

word stream

syntax highlighting

syntax analysis

abstract syntax tree

syntax highlighting, code formatting, and code folding

semantic analysis

abstract syntax tree with type information and symbol table

rename, refactoring, automatic code generation, and automatic code rewriting

data stream analysis

control flow diagram and conflict diagram

edit and continue

Here is just a few simple examples. Of course, the purpose of the above results will not be limited to this. I believe that exposing the internal modules of the compiler to users can produce countless interesting and valuable applications.

 

Each stage of the compiler can also be divided into two major stages based on its purpose: lexical analysis, syntax analysis, and semantic analysis focuses on the symbol system for processing programming languages, collectively referred to as the front-end of the compiler ), stage code generation, standardization, command selection, control flow analysis, data stream analysis, register allocation, instruction outflow, assembly, and connection are collectively referred to as the back end of compilation (back -end ). It should be said that the focus of modern compiler research is the back-end of the compiler, because the front-end technology is relatively mature. However, front-end technologies may be more useful and interesting for our daily development. So I will spend a lot of time on front-end technology. After you complete the front-end of a compiler, you have several backend options:

    1. Use the CLR or Java virtual machine as the backend. This method is the easiest because these large virtual machines are extremely abstract. It is ideal for dynamic languages and scripts.
    2. Use a reliable open-source or commercial backend framework. Like the famous llvm (http://llvm.org /). In this way, llvm performance optimization results and cross-platform features can be directly used.
    3. Implement the backend by yourself. There are many things to do, but it is more helpful to understand the technology of translating and optimizing code.
    4. Explain the execution. Not explained...

The example I will show, though minisharp is a subset of the C # syntax, is not limited to running on CLR. I will set it to a language that can be redirected, that is, it can be targeted at multiple backend servers. This example can be used to demonstrate as many technologies as possible. I will also dynamically adjust the content of this series based on my ability scope and work progress. I also hope you will continue to pay attention to vbf. compilers Project (https://github.com/Ninputer/VBF) and my weibo (http://weibo.com/ninputer )! We look forward to the next article.

Http://www.cnblogs.com/Ninputer/archive/2011/06/07/2074632.html

In the first article of this series, I would like to give an overview of the Compiler Construction and help you understand the purpose of each component in the compiler. I think you should read other compilation principles books. Most of them are in the first chapter or in the preface and so on. The compiler is divided into many modules, and each module is responsible for the specific stage of compilation. Finally, it is written together to form a complete compiler. For example, the following figure shows the compiler phase in chapter 1 of modern compiler by Andrew W. Appel:

So, why should we split the compiler into phases and modules? The answer is to make it easier to design and understand. How can a compiler be considered a big project? If it is not broken down, it will be very difficult to compile and maintain it. The clearer the module division of the compiler, the simpler the work. For example, in the lexical analysis phase, converting the input token stream into a word stream greatly reduces the input types to be judged during the syntax analysis phase, simplified design also helps improve performance. In addition, modularization also isolates the work of the compiler in every stage as much as possible. For example, the compiler can perform optimizations unrelated to the Specific CPU, or perform specific optimizations for a specific CPU. They can all be performed independently without re-designing the entire system.

 

The fact may be surprising that the various stages and modules of the compiler are designed, or even related to the syntax of this programming language. For example, in the early programming language Fortran, people have not mastered the theory of so many compilation principles at the beginning of design, its Syntax cannot be clearly divided into lexical analysis and syntax analysis stages like today's language. Because the syntax of FORTRAN does not contain the lexical structure that can be independently processed by an automatic machine. Therefore, the Fortran compiler is complex in syntax analysis. Languages with some historical backgrounds may also have this complicated syntax. For example, Visual Basic cannot be used as an independent lexical analyzer to scan a language. Therefore, the syntax analyzer of VB is much more difficult to write than new languages such as C. Another example is that early Pascal and some C languages allow specific syntaxes to specify a variable as a register variable (which may still exist in recent Delphi for verification ). This is because there was no very effective register allocation algorithm at the time, and the programmer had to decide based on his own experience. Today, if a language allows explicitly specifying a variable as a register variable, it will interfere with the design of the register allocation module. To sum up, I would like to give you some suggestions for future compiler designers. Designing Your syntax can greatly simplify the design of the compiler!

 

In addition to simplified design, modularization of each stage of the compiler has greater value. We thought that the compiler only needs to compile the source file into the final target code. But with the emergence of a variety of development tools-editor, Automatic completion, commissioning, refactoring tools, test coverage detection, performance profiling ...... It is found that the results produced by each stage in the compiler compilation process may be very valuable. Exposing the internal structure and intermediate results of the compiler to users is an inevitable trend. For example, the compiler as a service feature provided by Visual Studio in the next generation is to expose the internal modules of the compiler to users as a service. Let me give you a few examples to see the possible use of the output of the compiler module:

 

compiler stage

result

purpose

lexical analysis

word stream

syntax highlighting

syntax analysis

abstract syntax tree

syntax highlighting, code formatting, and code folding

semantic analysis

abstract syntax tree with type information and symbol table

rename, refactoring, automatic code generation, and automatic code rewriting

data stream analysis

control flow diagram and conflict diagram

edit and continue

Here is just a few simple examples. Of course, the purpose of the above results will not be limited to this. I believe that exposing the internal modules of the compiler to users can produce countless interesting and valuable applications.

 

Each stage of the compiler can also be divided into two major stages based on its purpose: lexical analysis, syntax analysis, and semantic analysis focuses on the symbol system for processing programming languages, collectively referred to as the front-end of the compiler ), stage code generation, standardization, command selection, control flow analysis, data stream analysis, register allocation, instruction outflow, assembly, and connection are collectively referred to as the back end of compilation (back -end ). It should be said that the focus of modern compiler research is the back-end of the compiler, because the front-end technology is relatively mature. However, front-end technologies may be more useful and interesting for our daily development. So I will spend a lot of time on front-end technology. After you complete the front-end of a compiler, you have several backend options:

    1. Use the CLR or Java virtual machine as the backend. This method is the easiest because these large virtual machines are extremely abstract. It is ideal for dynamic languages and scripts.
    2. Use a reliable open-source or commercial backend framework. Like the famous llvm (http://llvm.org /). In this way, llvm performance optimization results and cross-platform features can be directly used.
    3. Implement the backend by yourself. There are many things to do, but it is more helpful to understand the technology of translating and optimizing code.
    4. Explain the execution. Not explained...

The example I will show, though minisharp is a subset of the C # syntax, is not limited to running on CLR. I will set it to a language that can be redirected, that is, it can be targeted at multiple backend servers. This example can be used to demonstrate as many technologies as possible. I will also dynamically adjust the content of this series based on my ability scope and work progress. I also hope you will continue to pay attention to vbf. compilers Project (https://github.com/Ninputer/VBF) and my weibo (http://weibo.com/ninputer )! We look forward to the next article.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.