Introduction to compilation principles

Source: Internet
Author: User


 

Large
Why should we set up compilation principles for the course? This course focuses on the Compiler Principles and technical issues, and seems to be irrelevant to the basic field of computer science. However, the compilation principle has always been a required course for undergraduate students.
The course also becomes a mandatory part of the postgraduate entrance exam. The Compilation Principle and Technology are essentially an algorithm problem. Of course, due to the complexity of this problem, the algorithm is also relatively complicated. We
The data structure and algorithm analysis are also about algorithms, but not about basic algorithms. In other words, it is about algorithms, the course on compilation principles focuses on solving an algorithm. In 1950s, compilation of compilers was always considered very difficult. It was said that it took 18 years for the first Fortran compiler to complete. While people try to write compilers, many compilation-related theories and technologies are created, which are more valuable than an actual compiler. Just like mathematicians who are solving the famous godebach conjecture, although they have not finally solved the problem, many famous related number theory were born in the meantime.

 

Reference books

Although the compilation theory has developed to this day, it is quite mature, but as a college student, it is still too difficult to write a compiler like turboc C and Java. Not only is it difficult to write the compiler, but it is also difficult to learn the compilation principles.

It is precisely because the compilation principle is relatively difficult to learn, so it is necessary to have good teachers and good teaching materials. For teachers, we can't change it by ourselves, but for teaching materials, we can read it as needed. I would like to recommend some textbooks on compilation principles. The books I recommended are classic textbooks from abroad, because I have not found anything satisfactory in Chinese textbooks.

 

The first book is called compilers principles, techniques, and tools, and the other name is longshu. The reason is that there is a red dragon on the cover of this book, and this book is indeed very famous in the field of compilation principles, so many foreign scholars directly named it longshu. Recently, the Mechanical Industry Publishing House has published a Chinese version of this book, which is called compilation principles. The book was written earlier, probably in 85 or 86 years. One of the authors is also a scientist at the famous Bell Laboratory. The core compilation principles described in have not changed so far, so its value has been remarkable until today. The biggest feature of this book is to list the general content of the compilation principle through a small practical example at the beginning, so that many beginners of the Compilation Principle will soon have a bottom-up, I also know why these theories exist and how to apply them. This is what I feel is lacking in teaching materials in China. Therefore, Chinese teaching materials are not intended for the readers who are willing to learn by themselves. In short, they have been reading for a long time, but do not know what the content is useful.

The second book is called modern compiler design, and the Chinese name is modern Compilation Program Design. This book is published by the People's post and telecommunications Publishing House. This book focuses on the practice of compilation principles. It provides a lot of actual program code and many practical compilation technical issues. Another feature of this book is its "modern" character. In traditional compilation principles, you cannot see algorithms such as garbage collection in Java. This is because the explanatory execution language such as Java has become popular in recent years. If you want to learn more about the theory of compilation principles, you must read the previous longbook. If you want to develop an advanced compiler by yourself, so you have to read this modern Compilation Program Design.

The
The three books are the compilation principles and practices recommended by many domestic compilation principles scholars. Maybe this book was introduced to China earlier. I remember I bought it in high school, but it was just a while ago.
Read the entire book. This book is indeed a good choice for beginners. The compilation principles provided in the book are also quite detailed. Although it is not as deep as the previous longshu, many places are just as far as they are.
Undergraduate Teaching is already very deep. The book focuses on practice, but it does not feel as practical as the previous modern Compilation Program Design. This book focuses on practice in principle, while
It is not the same technical practice as the previous one. While explaining each part of the compilation principle, the compilation principle and practice is also gradually practicing a modern compiler tiny C. after you finish reading the entire book, you can write a tiny C. The author also gives a detailed description of the two commonly used compilation tools Lex and YACC, which are hard to be seen in Chinese textbooks.

 

These three textbooks are both in English and Chinese versions. Many good English students only like to read the original books. I don't feel like the translation of these three books is very good. I don't need to buy the English version. Understanding the essence of theory is more important than understanding the surface of the text.

 

Essence of compilation principles

As mentioned above, learning the compilation principle is actually just learning algorithms. It is nothing special. However, the generation of these algorithms has formed a set of theories. Next let's take a look at some advanced theories in the compilation principles.

 

Almost every Compilation Principle textbook is divided into lexical analysis, syntax analysis (LL algorithm, recursive descent algorithm, LR algorithm), semantic analysis, runtime environment, intermediate code, code generation, code optimization. In fact, many textbooks on compilation principles are produced
Therefore, the content format of this dragon book has almost become the formula for compiling principles textbooks, including domestic textbooks. In general, undergraduate teaching in a university is impossible.
All of the above sections are carefully explained, but they are more focused on the previous sections. Something like code optimization is like a bottomless pit. If you want to take it seriously, it is impossible to take a separate semester class.
Clearly. Therefore, for undergraduates, the requirements for lexical analysis and syntax analysis are relatively higher.

 

Word
Method Analysis is relatively simple. It may be that the lexical analysis program itself is easy to implement. Many people who have not learned the compilation principles can also write a variety of lexical analysis programs. However, the compilation principles are explained.
During lexical analysis, we will focus on adding regular expressions and the theory of automatic machines, and then explain the generation of lexical analysis programs in a very standard way. The principle is obvious, that is, to enable lexical analysis from
The program has risen to the theoretical level.

 

The syntax analysis is a little more troublesome. There are usually two types of syntax analysis algorithms: ll top-down algorithm and LR bottom-up algorithm. The ll algorithm makes it easy to say that the LR algorithm is difficult. Many of the principles of self-built compilation are solved by the LR algorithm. In fact, all these things can be understood as long as you understand them. It is not as true that you have to write them as you do in lexical analysis. Syntax analyzers such as LR algorithms are generally generated using the YACC tool. In practice, they are completely different. For the special recursive descent algorithm in ll algorithm, because its practice is very simple, every student should be required to write it by themselves. Of course, there are also a lot of good ll algorithm syntax analyzers, but if you change to a non-C platform, such as Java, Delphi, you cannot use the YACC tool, then you only have to write the syntax analyzer yourself.

 

When you learn lexical analysis and syntax analysis, you may have the following question: "What is lexical analysis and syntax analysis ?" From the perspective of the compiler, the compiler needs to convert the source program written by the programmer into a convenient data structure (Abstract syntax tree or syntax tree ).
The conversion process is based on lexical analysis and syntax analysis. In fact, lexical analysis is not included in the necessary part of the compiler at the very beginning, but we just put the lexical analysis in a traditional way to simplify the syntax analysis process.
The tedious work is extracted separately, which becomes the current lexical analysis part. In addition to the compiler, lexical analysis and syntax analysis are also useful elsewhere. For example, when we input commands in DOS, UNIX, and Linux, how does the program analyze the command form you entered? This is also a simple application. In short, the work of these two parts is to convert the text information that is not "rule" into a data structure that is better analyzed and processed. So why are the compilation principles tutorials ultimately converting the source analysis to a "Tree" data structure? Data structures include stack, line, list... Each of these data structures has its own characteristics. However, the tree structure is highly progressive. That is to say, after we extract any node of the tree, it is still a complete tree. This is in line with the formal language of our current Compilation Principle Analysis. For example, we use the function tree, the loop in the loop, and the condition in the condition, then it can be intuitively displayed in the tree
Data structure. Similarly, this is also true when we execute formal language programs. The code generated after the compilation principle introduces a stack-type intermediate code.
It is easy to parse the abstract syntax tree. This instruction code can be generated by means of recursive traversal of the abstract syntax tree. This code is also widely used in other explanatory languages. Similar to the popular Java,. net, its underlying bytecode can be said to be the stack-based instruction code.

 

Semantic Analysis, syntax-guided translation, and type check are all processes that improve the abstract syntax tree. For example, when we write a C language program, we all know that if we assign a floating point number to an integer directly, there will be Type Mismatch. How does the C language compiler know? This step checks the type. Languages like C ++ that support multi-state functions are much more complicated. Most of the textbooks on compilation principles explain some good processing strategies in this part. Because new problems are always happening, the old methods are not enough to solve them.

 

Ben
For example, as a compiler, the function is to generate the source code entered by the user to the final code. However, when explaining the final code generation, you have to explain the machine running environment and other content. Because if you
If you do not know how the machine executes the final code, you certainly cannot know how to generate the appropriate final code. I feel that this part of content is more meaningful than the compilation principle itself. Because it puts a computer
The running process of the program is always in front of you. You may not be engaged in Compiler development in the future, but as long as it is related to computer software development, it will involve the execution process of the program. The runtime environment helps you better understand how a computer program is stored, loaded, and executed. For more information, I strongly recommend that you read the explanation in longshu. The author uses the most basic storage organization, storage allocation policy, non-local name access, and parameter transfer, the symbol table to dynamic storage allocation (malloc, new) are described in detail. These are things we often do when writing common programs, but we do not need to explore how they are done internally.

 

About intermediate code generation, code generation, Generation
The content of code optimization is really hard to say. Many teaching materials in China will be very simple and easy to talk about. Students only understand and do not know how to use them. However, this part of content is like
If you want to do this seriously, you can't finish the course for one semester. In the book "compilation principles and practices", the explanation of this part is just right. The author mainly explains a stack-based instruction code, 10
It is easy to imitate. After reading it, you can write your own code to generate it. Of course, for other code generation technologies, it is very simple to explain the code optimization technology. If you want to study it carefully
The code generation technology is also called Advance compiler Desgin and implement. This book is introduced by the Machine Industry Publishing House, which is very heavy and is the original English version. However, I did not list this book as a recommender for everyone. After all, I can clearly understand the content of longshu, and it is already a good expert in China, it's not too late to read this advance compiler Desgin and implement. The code optimization part is not very important in the undergraduate teaching, that is, it is a practical process, and I believe that everyone is not very useful. After all, it is quite good for the self-built compiler to correctly generate and execute code. What optimization should we talk about?

 

Practices

Editing
After all, the course for translating principles is only a course for explaining principles, not a specialized compilation technology course. These two courses are quite different. Compilation technology pays more attention to the technologies used in the compilation process.
The course focuses on explaining its basic theory. However, computer science itself is a very practical course. If we can apply what we have learned, it will be called a real learning. Li Yang explained Crazy English as long
You will actually use a word or phrase before you can learn it, rather than simply knowing its spelling and meaning. In fact, any learning is the same.
Well, you can't count on learning.

 

The course of compilation principles mainly explains the theory and principle produced by the compiler. It is very simple. Writing a compiler by yourself is the best practice process. However, you must be careful that the compiling system may be one of the most complex systems in all software systems.
However, why does the University still write the compiler into a course called the compilation principle? I admire those who started to write their own operating systems after learning the operating system principles, and started to write and compile their own after learning the operating system principles.
Indeed, in China, there are too few students who dare to do this. Whether you can do this or not, at least this attempt will improve your programming and system planning skills. Under me
Give some questions about the difficulties you may encounter in the practice process, and hope to help you before you are in trouble.

 

1. Lex and YACC. These two tools are used as syntax analysis tools for lexical analysis. If you write a compiler on your own, I do not recommend that you use the conjunction method to analyze such things. Lex and YACC should be essential for every compilation principle, but they are rarely seen in Chinese textbooks. These two tools are little things in Unix systems. If you want to use them in Windows, you 'd better go to cygwin. The simulation of unixin Windows contains flex.exeand bison.exe (YACC. these two tools are quite troublesome to use (in fact, many very useful tools in UNIX are like this ), however, in the Book Compilation principles and practices, the two tools are described in great detail and many practical examples are given.

2. An interpreted language is simpler than a compiler that generates machine code. Although it is said that an interpreted compiler, like Java
You have to write the interpreter yourself, but you don't have to look up the machine code. If you generate the final machine code compiler, you may encounter problems, as well as register-based code generation.
Method. As mentioned above, if you generate a stack-based code, the code generation process is very simple and there are not many things to consider. If you consider the final generation of machine code, you must consider
How to allocate machine registers and so on.

3. consider using the syntax files generated by others. Do not write the lexical and syntax files by yourself. A friend once said that writing a good syntax definition of a program language almost completes half of a compiler. indeed, writing a syntax file is very difficult. now, you can find lexical files and syntax files in languages such as C, C ++, Java, tiny C, and minus C on the Internet. You can use them by yourself.

 

In the compilation principles and practices, the author provides all the code of tiny C. I feel that the author's compiler is doing very well. Compared with the source code of other languages such as PHP and Perl, it is much simpler and easier to understand, it also clearly shows the implementation process of a complete compilation system. the source code can be downloaded from the author's website.

 

Remarks

The learning of compilation principles may be a difficult process, especially for those who are not interested in the compilation system. since it has been a required course for undergraduate courses, it shows that the theory it has extended occupies a relatively important position in the field of computer science.

If we look at the history, we will find that many people called Program Design masters are experts in the compilation field. write Bill Gates, the first basic language running on a micro-machine, and design the "world's most amazing programmer" of Borland in Delphi. The father of Sun's Java and the father of C ++ at Bell's lab....


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.