(Based on Java) Compile the compiler and interpreter-Chapter 1st: introduction (serialization)

Source: Internet
Author: User

This chapter describes the objectives and methods used in this book and provides a full view of the compiler and interpreter.

Objectives and methods

This book teaches you how to design and develop compilers and interpreters:

  • Compile a major subset of Pascal (an advanced process-oriented programming language) using a Java-written compiler. (That is, include the main language features, but remove some irrelevant features that are convenient for writing compilers ).
  • The interpreter written in Java contains an interactive symbol debugger (the symbol debugger is based on the symbol table rather than the machine instruction set and hardware debugging function) to explain the same subset of PASCAL Language.
  • Integrated Development Environment (IDE) with graphical user interfaces ). This IDE is a simplified version of Eclipse, Borland JBuilder, and other ide with comprehensive functions. However, it also contains a source editor and an interactive interface for setting breakpoints, single-step debugging, viewing and modifying variable values, and so on.

Achieving these ambitious goals is a big challenge. Good skills will help you compile programs into machine languages or explain execution programs. Modern software engineering rules and good object-oriented design ideas will show you how to implement a compiler or interpreter through code, and eventually all the components can work well together. The compiler and interpreter programs are large and complex. Developing a small program requires only some skill. However, Nb programs such as compilers or interpreters also require software engineering rules and object-oriented design. Therefore, this book emphasizes essential skills, software engineering rules, and object-oriented thinking.

What are compilers and interpreters?

The main purpose of compilers and interpreters is to "translate" source programs written by High-level source languages. Translate the source program into the theme of the next few paragraphs.

In this book, the source language is a large subset of Pascal. In other words, you can compile or interpret formal Pascal programs. Because the compiler and interpreter are written in Java,Implementation LanguageIs Java.

The PASCAL Compiler translates the Pascal source program into a low-level machine language of a specific machine (more accurately speaking, the machine language of the CPU ). Generally, the source program is in text format. If the compiler works normally, the corresponding machine language is similar to the original Pascal source program (the same behavior, but the rendering method is different. For example, if you use a key to steal a car, you just need to use a wire to fire the car ). The machine language isTargetLanguage, the compiler generates the target code in the machine language. After the code is generated, the compiler task is completed. The target code is generally written to a file (generally a binary file ).

A program can contain several source files, and the compiler generates a target file for each file. A helper called linker combines the contents of these target files with the Runtime Library program into a computer that can load and executeTarget Program(Such as Windows pe program. The library program generally comes from the pre-compiled target file.

Because the machine language is not easy to remember, the compiler can generate the assembly language as the target language. The assembly language is only one step away from the machine language. Generally, each assembly instruction corresponds to a machine language instruction. If you have a short helper name (such as ADD and mov), you can easily remember more.Assembler(Another compiler) translates an assembly language into a machine language.

Figure 1-1 summarizes the process of compiling one or more source programs into the target program.

 

 

On the left side of the figure, a PASCAL program that contains three source files: sort1.pas, sort2.pas, and sort3.pas is translated into three target machine language files: sort1.obj, sort2.obj, and sort3.obj. The linker combines the three target files (associated with the relevant Runtime Library) into an executable target program sort.exe. The right side of the figure shows that the compiler translates the Pascal source file into the target file sort1.asm, sort2.asm, and sort3.asm of the assembly language, and then the assembler converts it into the target file of the machine language. The final target program sort.exe of the chain connector.

 

 

 

 

 

 

 

Figure 1-1

 

 

 

So are there differences between compilers and interpreters?

The interpreter does not generate any target program. On the contrary, it will execute the program when it is read into the source program. This is like a PASCAL program that reads the statement in the order it says. You can write down the variable value of the program on a draft paper and output the output result of each statement until the program ends. Essentially, what you do is exactly what the Pascal interpreter does. Pascal interpreter reads the program and executes the program. No target program needs to be generated and loaded. On the contrary, the interpreter translates the program into a series of actions used to execute the program ).

Compare compilers and interpreters

How can we determine when to use the compiler and when to use the interpreter?

When you hand over a source program to the interpreter, the interpreter takes over the inspection and execution. The compiler also checks but generates the target code. After the compiler is run, the linker is run to generate the Target Program, and the target program must be loaded into the memory to execute it. If the compiler generates assembly language code, you have to run the assembler. Therefore, it is clear that the interpreter requires fewer steps.

Interpreters are more common than compilers. You can use Java to write a Pascal interpreter and run it on a Microsoft Windows-based PC, Apple's Mac or a Linux host, the interpreter can execute the PASCAL program on the platform mentioned above. The compiler must generate code for a specific machine (whether generated directly or indirectly through the assembler ). So even if you want to run the original PASCAL Compiler for PC on Mac, the generated code is still PC. If you want it to generate code for Mac, you may have to rewrite some part of the compiler.

(The Compiler discussed later focuses on generating code for the Java Virtual Machine, because the virtual machine can run on many platforms. Therefore, if you are interested in generating code for a specific machine, you can replace the Virtual Machine with an x86 command generated on a real PC)

What happens if the source program contains a logical error, such as a variable with a division of 0, which is not found at runtime?

Because the interpreter controls everything while executing the program, it can stop and tell you the number of problematic rows and variable names. It even prompts you which operations you can perform before continuing to execute the program, such as modifying the variable value to a non-zero value. The interpreter can contain an interactive source-level debugger, commonly known as the symbolic debugger ). The symbol debugger means the symbols in your available programs, such as the variable name.

On the other hand, the target program generated by the compiler and the linker is generally self-running (executed by machines without a third party ). Information about the source program, such as the row number and variable name, is invisible to the target program. When an error is thrown during running, the program is interrupted, and a message containing the problematic command address may be printed. So you can find out the problem of division by zero of the related statement variables in the source program ,.

Therefore, in debugging, the interpreter is the right path. Some compilers add some additional information to the target code, so that when an error occurs, the target program can print the number of problematic lines and variable names. So you correct the error, recompile it, and then run it again. Generating additional information will lead to slower program execution than normal (this is why visual C ++ has run/debug compiling mode ). This prompts that you should turn off the debugging feature and re-compile the program after the program reaches the final "product" version.

Assuming that you have successfully debugged the program, the focus will be on how to make it run faster. Because the machine can execute native machine language programs as quickly as possible, the Compilation Program can be several times faster than the interpreter. Obviously, the compiler is the winner, especially when the optimized compiler knows how to generate optimization code for a specific scenario. Therefore, whether a compiler or interpreter is used depends on who develops and executes the program. Ideally, an interpreter with a signed source-level debugger is used in the development process. A compiler that generates machine code can execute the code faster after the program is debugged. These are the goals of this book, because they are taught by compilers and interpreters.

The scenario becomes a bit fuzzy.

The differences between compilers and interpreters can be easily explained, but with the rapid prevalence of virtual machines, the situation becomes a bit vague.

A virtual machine is a program used to simulate a machine (computer. This program can run on different real computer platforms. For example, a Java Virtual Machine (JVM) can run on a Microsoft Windows-based PC, Apple's Mac, Linux, and many other platforms. (For example, or IBM minicomputers ).

Virtual machines have their own virtual machine languages, and virtual language commands are interpreted by real hosts. So if you write a translator that translates the Pascal source program into a virtual machine language interpreted by the host machine, is this translator a compiler or interpreter?

We don't have to worry about it. This book stipulates that if a translator converts the source program into a machine language, whether it is a real machine language or a virtual machine language, this translator isCompiler. If the Translator does not generate a machine language to execute the program firstInterpreter.

Why Learning compiler compiling technology?

We all take it for granted that we have learned a rough idea about the compiler and interpreter, because you need to focus on writing and debugging programs during development, and you do not even need to think about the working mechanism of the compiler. You may only notice the existence of the compiler after the wrong syntax compiler throws an error message. If there is no syntax error, the compilation will generate the correct code. If your program runs abnormally, you may blame the compiler, but most of the time, you will find errors in your program.

The above situation usually occurs when your compiler, interpreter, and IDE of a popular programming language (such as Java or C ++) are ready for you. Let's talk about this first.

However, we have seen many new programming languages being developed recently. The driving force includes WWW (such as HTML5) and new languages (such as PHP and pure web) that are applicable to Web-based applications ). Higher requirements on programmer productivity give birth to a new language that closely integrates with specific application fields (this can be used in many examples, such as various shell languages for system administrators, the various SQL/No SQL languages developed for databases, the similar VHDL languages developed for circuit boards/DSPs, and various BPM languages developed for workflows ). You may very much expect that you will be able to invent a new scripting language expression algorithm or control the processes related to your domain. If you want to develop a new language, the corresponding compiler and interpreter are essential.

The compiler and interpreter are quite fun, but you have noticed that none of them is a small program and you need to develop the skills related to success, modern software engineering rules and good OO design ideas are essential. In addition to learning the sense of satisfaction brought by the working mechanism of the compiler interpreter, you should also face the challenge of writing them with a smile.

Concept Design

Prepare for the next few chapters. Let's review the concept design of the compiler and interpreter.

Design Notes

The conceptual design of a program is an advanced view of its software architecture. Conceptual design includes the main components of programs, how they are organized, and interaction details among others. It does not need to explain how components are implemented. More specifically, it allows you to confirm and understand components without worrying about how to develop them.

You can classify compilers and interpreters as program language translators. As explained above, the compiler translates the source program into a machine language and the interpreter translates it into a series of actions ). The translator provides a front end and a back end ). Following the software reuse rules, you will see that the PASCAL Compiler and the Pascal interpreter share the frontend, but there are different backends.

The front-end of the translator reads the source program and then executes the initial translation process. Its main components include parser, Parser (more academic), token (minimum language unit, maximum lexical Unit), and source (source code ).

Paser controls the frontend translation process. It constantly reads tokens from tokens and determines the high-level language elements of the current Translation Based on the token string (that is, the token mode), such as arithmetic expressions, value assignment statements, and process declarations. Parser checks whether the source program syntax is correct. What paser does is called parsing. parser analyzes the source program and converts it. (To what? There will be an intermediate layer such as the abstract syntax tree)

The tokens is a low-level element of the source language. For example, Pascal tokens contains keywords such as begin, end, if, then, and else. The identifier is the variable, process, function name (identifier, also called ID), and special symbols such as =: = +-* And /. What the hacker does is scan ). A token scans the source program and divides it into tokens.

Figure 1-2 shows the conceptual design of the compiler and interpreter front-end

Figure 1-2

In this figure, the arrow indicates that a component sends a command to another component. Parser tells the token that the next token is required. The token obtains characters from the source and constructs a new token. The token also reads characters from the source. (Chapter 13 explains why the token and Token components both need to read characters from the source)

The compiler translates the source program into the target code of the machine language. Therefore, an important component of the backend is the code generator (the target code generator ). The interpreter executes the program, so the first component of its backend is the executor ).

If you want the compiler and interpreter to share the front-end, different backend must have a common interface to deal with the front-end (that is, you only need to pass the front-end to this interface ). Remember the frontend's initial translation process. The front-end generates intermediate code (intermediate code, analysis tree/syntax tree, abstract syntax tree, etc.) and symbol table as the middle layer of the public interface ).

Intermediate code is the pre-digest format of the source program. It can be understood as a digest format between the source program format and the machine language format, it is generally the analysis tree parse tree or the syntax tree) to facilitate the more effective processing of the backend (assuming the translator translates the plastic into a bottle, the source program is plastic, the middle code is the bottle cap, the bottle body, wrapping paper, so that the backend can install bottles more quickly ). The intermediate code in this book is a tree-like Data Structure in memory that represents the source program statements (that is, the syntax tree, a bunch of nonsense ). The symbol table contains the symbols (such as identifiers) of the source program ). The back-end of the compiler processes the intermediate code and symbol table to generate the machine language corresponding to the source program. The interpreter runs directly when it encounters the intermediate code and symbol table (usually the tree traversal process ).

For software reuse, you can design the intermediate code and symbol table into a language-independent structure. In other words, you can apply the same structure to different source languages. Therefore, the backend can also be independent of the language. When it processes these structures (intermediate codes and symbol tables), it does not need to know the specific source language.

Figure 1-3 shows the conceptual design of a more complex compiler and interpreter. If everything is fine, you only need the front-end to know the source language definition and the backend to know the difference between the compiler and interpreter.

Figure 1-3 a more complete conceptual design

Chapter 2 begins to enrich the conceptual design by designing a compiler interpreter framework. Chapter 2 describes scanning ). Chapter 5 builds the first symbol table, and Chapter 5 generates the initial intermediate code. Chapter 14 begins to write executors and is incrementally developed until 14, including the symbol debugger and IDE. Code generation is not involved in chapter 16 after Chapter 15 has learned the JVM architecture.

Syntax and Semantics)

The programming language syntax is a series of rules used to determine whether statements or expressions written in this language are correct. The semantics of a language conveys the specific meaning of statements and expressions (who assigns a value to whom, and what the cycle termination condition is ). For example, Pascal's syntax tells us that I: = J + K is a valid value assignment statement. The syntax is to add the current values of the variables J and K, and then assign them to I.

Parser performs relevant actions based on the syntax and semantics of the source language. Scanning the source program to extract tokens is a syntax action. Find the value assignment statement: the target variable after = is a syntax action. It is a semantic action to store identifiers I, j, and K as variables into the symbol table or to search for them in the symbol table in the future, because parser must understand the meaning of the current expression and value assignment to know the symbol table. The generated intermediate code representing the value assignment statement is a semantic action.

Syntax actions occur at the front end, and semantic actions exist at the front end. To execute a program on the backend or generate the target code, you need to know the specific meaning of the statement, so it is part of the semantic action. The intermediate code and symbol table store semantic information.

Lexical, syntax, and Semantic Analysis

Lexical analysis is the official description of scanning. Therefore, lexical analyzer is also called lexical analyzer ). Syntax analysis is the formal title of parsing (parsing, the main task of parser), and the syntax analyzer is parser. Semantic Analysis Mainly checks whether Semantic Rules are complete. Type checking is an example. It ensures that the operand types of operators are consistent. Other semantic analysis operations include creating a symbol table and generating an intermediate code.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.