Why should we set up compilation principles for university courses? This course focuses on the Compiler Principles and technical issues, and seems to be irrelevant to the basic field of computer science. However, the compilation principle has always been a required course for undergraduate students, it has also become a mandatory part of the postgraduate entrance exam. The Compilation Principle and Technology are essentially an algorithm problem. Of course, due to the complexity of this problem, the algorithm is also relatively complicated. The data structure and algorithm analysis we have learned are also about algorithms, but the basic algorithms, in other words, are about algorithms, the course on compilation principles focuses on solving an algorithm. In 1950s, compilation of compilers was always considered very difficult. It was said that it took 18 years for the first Fortran compiler to complete. While people try to write compilers, many compilation-related theories and technologies are created, which are more valuable than an actual compiler. Just like mathematicians who are solving the famous godebach conjecture, although they have not finally solved the problem, many famous related number theory were born in the meantime.
Reference books
Although the compilation theory has developed to this day, it is quite mature, but as a college student, it is still too difficult to write a compiler like turboc C and Java. Not only is it difficult to write the compiler, but it is also difficult to learn the compilation principles.
It is precisely because the compilation principle is relatively difficult to learn, so it is necessary to have good teachers and good teaching materials. For teachers, we can't change it by ourselves, but for teaching materials, we can read it as needed. I would like to recommend some textbooks on compilation principles. The books I recommended are classic textbooks from abroad, because I have not found anything satisfactory in Chinese textbooks.
The first book is called compilers principles, techniques, and tools, and the other name is longshu. Why is there a red dragon on the cover of this book? Why? ? Why ??? Therefore, many scholars in other countries name longshu directly. Recently, the Mechanical Industry Publishing House has published a Chinese version of this book, which is called compilation principles. The book was written earlier, probably in 85 or 86 years. One of the authors is also a scientist at the famous Bell Laboratory. The core compilation principles described in have not changed so far, so its value has been remarkable until today. The biggest feature of this book is to list the general content of the compilation principle through a small practical example at the beginning, so that many beginners of the Compilation Principle will soon have a bottom-up, I also know why these theories exist and how to apply them. This is what I feel is lacking in teaching materials in China. Therefore, Chinese teaching materials are not intended for the readers who are willing to learn by themselves. In short, they have been reading for a long time, but do not know what the content is useful.
The second book is called modern compiler design, and the Chinese name is modern Compilation Program Design. This book is published by the People's post and telecommunications Publishing House. This book focuses on the practice of compilation principles. It provides a lot of actual program code and many practical compilation technical issues. Another feature of this book is its "modern" character. In traditional compilation principles, you cannot see algorithms such as garbage collection in Java. This is because the explanatory execution language such as Java has become popular in recent years. If you want to learn more about the theory of compilation principles, you must read the previous longbook. If you want to develop an advanced compiler by yourself, so you have to read this modern Compilation Program Design.
The third book is the Compilation Principle and Practice recommended by many domestic compilation principles scholars. Maybe this book was introduced to China earlier. I remember I bought it in high school, but I only read the whole book some time ago. This book is indeed a good choice for beginners. The compilation principles provided in the book are also quite detailed. Although it is not as deep as the previous longshu, many places are just as far as the point arrives. As a university undergraduate course, it is already very in-depth. The book focuses on practice, but it does not feel as practical as the previous modern Compilation Program Design. This book focuses on practice in principle, rather than the previous technical practices. While explaining each part of the compilation principle, the compilation principle and practice is also gradually practicing a modern compiler tiny C. after you finish reading the entire book, you can write a tiny C. The author also gives a detailed description of the two commonly used compilation tools Lex and YACC, which are hard to be seen in Chinese textbooks.
These three textbooks are both in English and Chinese versions. Many good English students only like to read the original books. I don't feel like the translation of these three books is very good. I don't need to buy the English version. Understanding the essence of theory is more important than understanding the surface of the text.
Essence of compilation principles
As mentioned above, learning the compilation principle is actually just learning algorithms. It is nothing special. However, the generation of these algorithms has formed a set of theories. Next let's take a look at some advanced theories in the compilation principles.
Almost every Compilation Principle textbook is divided into lexical analysis, syntax analysis (LL algorithm, recursive descent algorithm, LR algorithm), semantic analysis, runtime environment, intermediate code, code generation, code optimization. In fact, many textbooks on compilation principles are arranged for the teaching content of the Dragon book published at, so the content format of the Dragon book has almost become the formula for compiling principles textbooks, this is also true for textbooks in China. In general, it is impossible for the undergraduate teaching in a university to finish all the above sections, but to focus more on the previous sections. Something like code optimization is like a bottomless pit. If you want to take it seriously, it is impossible to give a separate semester class clearly. Therefore, for undergraduates, the requirements for lexical analysis and syntax analysis are relatively higher.
Lexical analysis is relatively simple. It may be that the lexical analysis program itself is easy to implement. Many people who have not learned the compilation principles can also write a variety of lexical analysis programs. However, when explaining lexical analysis, the Compilation Principle focuses on adding the regular expression and the theory of the automatic mechanism, and then explains the generation of the lexical analysis program in a very standard way. It is obvious that lexical analysis should be upgraded from the program to the theoretical level.
The syntax analysis is a little more troublesome. There are usually two types of syntax analysis algorithms: ll top-down algorithm and LR bottom-up algorithm. The ll algorithm makes it easy to say that the LR algorithm is difficult. Many of the principles of self-built compilation are solved by the LR algorithm. In fact, all these things can be understood as long as you understand them. It is not as true that you have to write them as you do in lexical analysis. Syntax analyzers such as LR algorithms are generally generated using the YACC tool. In practice, they are completely different. For the special recursive descent algorithm in ll algorithm, because its practice is very simple, every student should be required to write it by themselves. Of course, there are also a lot of good ll algorithm syntax analyzers, but if you change to a non-C platform, such as Java, Delphi, you cannot use the YACC tool, then you only have to write the syntax analyzer yourself.
When you learn lexical analysis and syntax analysis, you may have the following question: "What is lexical analysis and syntax analysis ?" From the perspective of the compiler, the compiler needs to convert the source program written by the programmer into a convenient data structure (Abstract syntax tree or syntax tree ), the conversion process is based on lexical analysis and syntax analysis. In fact, lexical analysis is not included in the necessary part of the compiler at the beginning, but we extract the tedious work of lexical analysis separately to simplify the syntax analysis process, the current Lexical Analysis Section. In addition to the compiler, lexical analysis and syntax analysis are also useful elsewhere. For example, when we input commands in DOS, UNIX, and Linux, how does the program analyze the command form you entered? This is also a simple application. In short, the work of these two parts is to convert the text information that is not "rule" into a data structure that is better analyzed and processed. So why are the compilation principles tutorials ultimately converting the source analysis to a "Tree" data structure? Data structures include stack, line, list... Each of these data structures has its own characteristics. However, the tree structure is highly progressive. That is to say, after we extract any node of the tree, it is still a complete tree. This is in line with the formal language of our current Compilation Principle Analysis. For example, we use the function tree, the loop in the loop, and the condition in the condition, it can be intuitively expressed in the data structure of tree. Similarly, this is also true when we execute formal language programs. In the code generation section after the compilation principle, we will introduce a stack-type intermediate code. We can easily analyze the abstract syntax tree, this instruction code can be generated mechanically by recursively traversing the abstract syntax tree. This code is also widely used in other explanatory languages. Similar to the popular Java,. net, its underlying bytecode can be said to be the stack-based instruction code.
Semantic Analysis, syntax-guided translation, and type check are all processes that improve the abstract syntax tree. For example, when we write a C language program, we all know that if we assign a floating point number to an integer directly, there will be Type Mismatch. How does the C language compiler know? This step checks the type. Languages like C ++ that support multi-state functions are much more complicated. Most of the textbooks on compilation principles explain some good processing strategies in this part. Because new problems are always happening, the old methods are not enough to solve them.
Originally, as a compiler, the function part is the source code input by the user to generate the final code. However, when explaining the final code generation, you have to explain the machine running environment and other content. Because if you do not know how the machine executes the final code, you certainly do not know how to generate the appropriate final code. I feel that this part of content is more meaningful than the compilation principle itself. Because it puts all the running processes of a computer program in front of you, you may not be engaged in Compiler development in the future, but as long as it is related to computer software development, the execution process of the program is involved. The runtime environment helps you better understand how a computer program is stored, loaded, and executed. For more information, I strongly recommend that you read the explanation in longshu. The author uses the most basic storage organization, storage allocation policy, non-local name access, and parameter transfer, the symbol table to dynamic storage allocation (malloc, new) are described in detail. These are things we often do when writing common programs, but we do not need to explore how they are done internally.
It is hard to say about the intermediate code generation, code generation, and code optimization. Many teaching materials in China will be very simple and easy to talk about. Students only understand and do not know how to use them. However, if you want to take this part seriously, you can't finish the course for one semester. In the book "compilation principles and practices", the explanation of this part is just right. The author mainly explains a kind of stack-based instruction code, which is very easy to understand and can be easily imitated after reading it. After you get down, you can write your own code to generate it. Of course, for other code generation technologies, it is very simple to explain the code optimization technology. If you need to carefully study the code generation technology, in fact, there is also the "advance compiler Desgin and implement", the book is now introduced by the Mechanical Industry Press, which is very heavy and is the original English version. However, I did not list this book as a recommender for everyone. After all, I can clearly understand the content of longshu, and it is already a good expert in China, it's not too late to read this advance compiler Desgin and implement. The code optimization part is not very important in the undergraduate teaching, that is, it is a practical process, and I believe that everyone is not very useful. After all, it is quite good for the self-built compiler to correctly generate and execute code. What optimization should we talk about?
Practices
After all, the course on compilation principles is only a course on explaining principles, not a specialized compilation technology course. These two courses are quite different. The compilation technology is more concerned with the technologies used in the compilation process, while the principle course focuses on explaining its basic theory. However, computer science itself is a very practical course. If we can apply what we have learned, it will be called a real learning. When talking about Crazy English, Li Yang said that as long as you actually use a word or phrase, you can learn the word or phrase, instead of knowing its spelling and meaning. In fact, all learning is the same. If you lack the combination of practice, you cannot learn it.
The course of compilation principles mainly explains the theory and principle produced by the compiler. It is very simple. Writing a compiler by yourself is the best practice process. However, you must be careful that the compilation system may be one of the most complex systems in all software systems. Otherwise, why does the University still compile the compiler into a course called the compilation principle? I admire those who started writing their own operating systems after learning the operating system principles, and started writing their own compilers after learning the compilation principles. Indeed, in China, there are too few students who dare to do this. Whether you can do this or not, at least this attempt will improve your programming and system planning skills. I will give you some questions about the difficulties you may encounter in the practice process, and hope to help you before you are in trouble.
1. Lex and YACC. These two tools are used as syntax analysis tools for lexical analysis. If you write a compiler on your own, I do not recommend that you use the conjunction method to analyze such things. Lex and YACC should be essential for every compilation principle, but they are rarely seen in Chinese textbooks. These two tools are little things in Unix systems. If you want to use them in Windows, you 'd better go to cygwin. The simulation of unixin Windows contains flex.exeand bison.exe (YACC. these two tools are quite troublesome to use (in fact, many very useful tools in UNIX are like this ), however, in the Book Compilation principles and practices, the two tools are described in great detail and many practical examples are given.
2. An interpreted language is simpler than a compiler that generates machine code. Although you have to write the interpreter yourself for an interpreted compiler like Java, you don't have to look up the machine code. If you generate the final machine code compiler, you may encounter problems, as well as the register-based code generation method. As mentioned above, if you generate a stack-based code, the code generation process is very simple and there are not many things to consider. If you consider the final generation of machine code, you must consider the trouble of allocating machine registers.
3. consider using the syntax files generated by others. Do not write the lexical and syntax files by yourself. A friend once said that writing a good syntax definition of a program language almost completes half of a compiler. indeed, writing a syntax file is very difficult. now, you can find lexical files and syntax files in languages such as C, C ++, Java, tiny C, and minus C on the Internet. You can use them by yourself.
In the compilation principles and practices, the author provides all the code of tiny C. I feel that the author's compiler is doing very well. Compared with the source code of other languages such as PHP and Perl, it is much simpler and easier to understand, it also clearly shows the implementation process of a complete compilation system. the source code can be downloaded from the author's website.