DIY Development Compiler (i) modular engineering of compilers

Source: Internet
Author: User

In the first part of this series, I want to summarize the compiler constructs and help you understand the purpose of each component in the compiler. Presumably we look at other compiling principles of the book, mostly in the first chapter or preface and other places, the compiler into a number of modules, and then each module is responsible for the specific phase of the compilation, and finally strung together to form a complete compiler. The following diagram, for example, is the compiler phase that appears in the first chapter of the Tiger Book (Modern Compiler by Andrew W. Appel):

So, why should the compiler be split into stages, modules? The answer is, to make it easier to design and understand. A complete compiler is a big project, if it is not decomposed, it will be very difficult to write and maintain. The clearer the compiler's module is, the easier it is to work. For example, in the lexical analysis phase of the input character stream into the word (token) flow, it greatly reduces the syntax analysis phase need to determine the type of input, in simplifying the design and also help improve performance. In addition, Modularity also will be independent of the work of each phase of the compiler as far as possible. For example, the compiler can perform specific CPU-independent optimizations, or specific optimizations for a particular CPU, which can be done independently without having to redesign the entire system.

It may be surprising that the various stages and modules of the compiler are designed and even related to the syntax of this programming language. For example, the early programming language of Fortran, in the design of people have not mastered the present so much of the theory of compiling principles, its grammar can not be as clear as today's language into lexical analysis and grammar analysis and other stages. Because the FORTRAN syntax does not contain lexical constructs that can be handled independently by automata. As a result, the compiler of FORTRAN language is more complicated in grammatical analysis. Languages that have some historical background may also have the syntax for this complication, such as Visual Basic, which is also a language that cannot be scanned with a separate, automata-based lexical parser. Therefore, the VB parser is much more difficult to write than new languages such as C #. Another example is that the early Pascal language and some C languages allow certain syntax to be used to specify a variable as a register variable (which may still exist in the recent Delphi, for verification). This is because there is no very effective register allocation algorithm at the time, it is necessary for the programmer to rely on their own experience to decide. Today, if a language also allows you to explicitly specify that a variable is a register variable, it interferes with the design of the Register allocation module. In summary I want to give you the future compiler designers a suggestion, well design your grammar, you can greatly simplify the design of the compiler!

In addition to simplifying the design, modularity of the various stages of the compiler has greater value. Originally we thought the compiler would just have to compile the source file into the final target code. But with a variety of development tools-Editor, auto-complete, debugger, refactoring tool, test coverage detection, profiler ... It is found that the results from each stage of the compiler compilation process can be very valuable. It is inevitable that the compiler's internal structure and intermediate results are exposed to the user. For example, the compiler as a service feature will be provided in the next generation of Visual Studio products by exposing the internal modules of the compiler to a user. Let me give you a few examples of the possible uses of the compiler module's output:

Results generated

TD valign= "Top" width= "213" >

Compiler stage

Purpose

Lexical analysis

Word flow

Syntax highlighting

parsing

Abstract syntax tree

syntax highlighting; code formatting; code folding

Semantic analysis

Abstract syntax tree with type information and symbol table

rename; refactor; code auto-generation The code automatically overwrites

Data flow analysis

Control Flow graph, Conflict diagram

Edit and continue running (Edit and Continue)

Here I just cite a few simple examples where the use of the above results is certainly not limited to this. I believe that exposing the internal modules of the compiler to the user can also generate countless interesting and valuable applications.

The various stages of the compiler can be divided into two stages according to their use: lexical analysis, syntactic analysis and semantic analysis focus on the symbolic system of programming language, collectively referred to as the compiler's front end (front-end), and intermediate code generation, normalization, command selection, control flow analysis, data flow analysis, The phase of the code calculation logic, such as register allocation, instruction outflow, assembly, and linkage, is collectively referred to as the compiled back-end (Back-end). It should be said that the focus of modern compiler research is on the backend of the compiler, because the front-end technology is relatively mature. But the front-end technology may be more likely to be used in our daily development, and often more interesting. So I will also spend more time on the front-end technology. When you have finished the front end of a compiler, there are several options for implementing the backend:

    1. Use the CLR or Java Virtual machine as the backend. Because these large virtual machines are extremely abstract, this approach is the easiest. Ideal for dynamic languages and scripts.
    2. Adopt a reliable open source or commercial back-end framework. such as the famous LLVM (http://llvm.org/). This enables direct utilization of LLVM performance optimization results, as well as cross-platform features.
    3. Implement the backend yourself. There are more things to do, but it's more helpful to understand the techniques of translating and optimizing your code.
    4. Explain the execution. Don't explain ...

The example I will show Minisharp, although a subset of C # syntax, does not have to be qualified to run on top of the CLR. I'll set it up as a redirected language that can be targeted at a variety of backend. This allows you to demonstrate as many techniques as possible with an example. I will also adjust the content of this series in terms of my own ability and work schedule. I also hope you will continue to follow the Vbf.compilers project (Https://github.com/Ninputer/VBF) and my Weibo (http://weibo.com/ninputer)! Please look forward to the next article.

DIY Development Compiler (i) modular engineering of compilers

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.