Why do college courses offer the principle of compiling? This course focuses on the compiler aspects of the production principle and technical problems, it seems that the basic field of computer is not close, but the compilation principle has been as a compulsory course of undergraduate courses, but also become a postgraduate entrance examination must be tested content. Compiler theory and technology is essentially an algorithmic problem, of course, because this problem is very complex, its solution algorithm is relatively complex. We learn the data structure and algorithm analysis is also speaking of algorithms, but the basic algorithm, in other words, the introduction of the algorithm, and the compiler principle of this course is to focus on a more focused solution of an algorithm. In the the 1950s, compiler writing has been considered a very difficult thing, the first Fortran compiler said it took 18 years to complete. While people are trying to write compilers, many theories and techniques related to compiling are born, and these theories and techniques are more valuable than the actual compilers themselves. Like mathematicians in solving the famous Goldbach conjecture, although there is no final solution to the problem, but in the course of the birth of many famous books related to number theory.
Recommended reference Books
Although the compilation theory has developed to today, has already had the relatively mature part, but as a university student, must write a kind of compiler like Turboc C,java is still too difficult. Not only is it difficult to write compilers, but it is also difficult to learn the principles of compiling.
It is because of the relatively difficult to learn the compiling principle, then requires good teachers and good teaching materials. Teachers are not we can change their own, and in the teaching materials we can read according to their own wishes. I recommend some of the following good compiling principles of the textbook. I recommend the books are foreign classic textbooks, because in the domestic textbooks, did not find anything to satisfy.
The first book is called "Compilers Principles,techniques,and Tools", and another loud name is Dragon book. The reason is that there is a red dragon on the cover of this book, and also because Jue Yi Shuenn scr qu? dun? x Xuan Munsu the fat Embankment??? Therefore, many foreign scholars are directly named Dragon Book. Recently, the mechanical industry Press has published the Chinese version of the book, the name is called "Compiling principle". The book was early, probably written in 85 or 86 years, and one of the authors was a famous Bell Labs scientist. The core of the explanation of the compiler principle has not changed so far, so until today, its value is extraordinary. The most important feature of this book is the beginning of a practical small example, the compilation principle of the general contents of the list, so that many of the compiler principles of beginners quickly in the mind has a bottom, but also know why there are these theories, how to use these theories. And this is what I feel the lack of domestic textbooks, so the domestic textbooks are not written to willing to self-study readers, in short, let people look at half a day, but do not know what is the use of things inside.
The second book is originally called "Modern Compiler Design", the Chinese name is called "Modern Compiling program design". The book was published by the People's post and telecommunications press. This book is more concerned about the practice of compiling principles, the book gives a lot of actual program code, there are a lot of actual compilation technical problems and so on. Another feature of this book is its "modern" word. In the traditional textbook of compiling principles, you can't see algorithms like "garbage collection" in Java. Because Java's interpretation of the implementation language is something that has prevailed in recent years. If you want to learn more about the theory of compiling, then you have to look at the Dragon book in front, if you want to do an advanced compiler yourself, then you have to read this "modern compilation programming."
The third book is a lot of domestic compiling principles and scholars have recommended the "compilation Principle and Practice." Perhaps this book was introduced to the domestic relatively early, I remember that I bought this book in high school, but also in the previous period of time to read the whole book. This book is also a good choice for introductory tutorials. The compilation principle given in the book is also quite meticulous, although not as deep as the previous Dragon book, but many places are donuts, as a university undergraduate teaching has been very deep. The book is characterized by a focus on practice, but the feeling is not as good as the previous "Modern compilation Programming" practice taste heavier. The focus of the book is on the principle of practice rather than the technical practice of the previous one. "Compiling principle and practice" in the interpretation of the various parts of the compilation principle, but also in the gradual implementation of a modern compiler tiny C. When you finish the whole book, you can write a tiny c. The authors also elaborate on the two commonly used compilation-related tools for Lex and YACC, which is also difficult to see in domestic textbooks.
The three textbooks are recommended for both English and Chinese versions. Many English good students only like to read the original book, not my feeling is that the translation of the three books are very good, there is no need to buy English version of special. It is more important to understand the essence of theory than to understand the surface text.
The essence of the compiling principle
As has been said before, learning to compile the principle is actually learning algorithm just, nothing special. But the generation of these algorithms has formed a set of theories. Let me take a look at the compiler principle of what is the advanced theory.
Almost every textbook of compiling theory is divided into lexical analysis, syntax analysis (ll algorithm, recursive descent algorithm, LR algorithm), semantic analysis, runtime Environment, intermediate code, code generation, code optimization these parts. In fact, many of the compiling principles of the textbook are in accordance with the 85,86 published the Dragon book to arrange the teaching content, so that the content of the Dragon book is almost the format of the compiling principle of the textbook, including domestic textbooks. Generally speaking, the undergraduate teaching in the university is impossible to finish all the above parts seriously, but rather more emphasis on the previous parts. Like code optimization that part of the thing, like a bottomless pit, if you want to seriously, is to open a semester alone can not speak clearly. Therefore, in general for undergraduates, the word analysis and grammar analysis to grasp the requirements of a relatively high point.
Lexical analysis is relatively simple. It may be that the lexical analysis program itself is easy to implement, many people who have not learned the principles of compiling can also write a variety of lexical analysis procedures. However, the compilation principle in the interpretation of lexical analysis, the emphasis on the regular expression and automata principle is added, and then in a very standard way to explain the production of lexical analysis procedures. It is clear that this approach is to make lexical analysis from the process to the point of theory.
The grammatical analysis section is a little more troublesome. Now there are generally two syntax analysis algorithms, LL-top-down algorithm and LR bottom-up algorithm. ll algorithm Fortunately said, to the LR algorithm, the difficulty comes. Many self-taught compiling principles are encountered when the LR algorithm is understood into a problem after the abandonment of self-study. In fact, these things are as long as you understand it, it is not like lexical analysis that must be written out to be true. The parser, like the LR algorithm, is usually generated with the tool YACC, and in practice there is no comparison of its own to achieve. For the LL algorithm, the special recursive descent algorithm, because its practice is very simple, then it should be required for each student to write their own. Of course, there are also a lot of good ll algorithm parser, but if the non-C platform, such as Java,delphi, you can not use the YACC tool, then you have to write the parser yourself.
When you learn lexical analysis and grammatical analysis, you may have the question: "What is lexical analysis and grammatical analysis?" "From the compiler's point of view, the compiler needs to translate the programmer's writing source program into a convenient data structure (abstract syntax tree or syntax tree), then the process of this transformation is through lexical analysis and grammar analysis." In fact, lexical analysis is not the beginning of the compiler must be included in the necessary parts, but in order to simplify the process of grammatical analysis, the lexical analysis of this tedious work is extracted separately, it is now the lexical analysis part. In addition to the compiler section, lexical analysis and parsing are also useful elsewhere. For example, when we enter a command under Dos,unix,linux, how does the program analyze the command form you entered, which is also a simple application. In short, the work of these two parts is to transform the text information of the non-"rule" into a kind of data structure with good analysis and processing. So why is the tutorial of compiling the principle of the final transformation of the source analysis to the analysis of the "tree" data structure? There is a stack in the data structure, line,list ... So many data structures, each have their own characteristics. But the structure of the tree has a strong recursion, which means that we can extract any node node of the tree, and it is still a complete tree. This is in line with the formal language we are now compiling, such as the use of function trees in functions, loops in loops, conditions of use, and so on, which can be intuitively represented on the tree data structure. Similarly, we are also recursive in the execution of formal language programs. In the code generation after the compilation principle, we will introduce a stack of intermediate code, we can based on the analysis of the abstract syntax tree, it is very easy, very mechanically recursive traversal of the abstract syntax tree can be generated by this kind of instruction code. This code is also widely used in other interpreted languages. Like the current popular java,.net, its underlying bytecode bytecode, which can be said to be the stack-based instruction code.
In fact, semantic analysis, grammatical guidance translation, type checking and so on, are all a kind of process to perfect the abstract syntax tree that we get before. For example, when we write the C language program, we all know that if a floating-point number is directly assigned to an integer, there will be a type mismatch, then the C language compiler how to know it? Is through the type check of this step. As in the C + + language, which supports polymorphic functions, this part of the problem is more complicated. In this part, the teaching material of the large compiling principle is to explain some better processing strategies. Because the new problem is always happening, the old method is not enough to solve.
Originally said, as a compiler, the part that works is the user input source program to the final code generation. But when we explain the final code generation, we have to explain the machine operating environment and so on. Because if you don't know how the machine executes the final code, then of course you don't know how to generate the right final code. This part of the content I feel its meaning even more than the compilation principle itself. Because it will put a computer program running in front of you, you may not be engaged in the development of the compiler, but as long as the computer software development-related areas, will involve the implementation process of the program. The run-time environment will give you a clearer idea of how a computer program is stored, how it is loaded, and how it is executed. On the part of the content, I strongly recommend that you look at the Dragon book on the explanation, the author from the most basic storage organization, storage allocation strategy, non-local name access, parameter transfer, symbol table to dynamic storage allocation (malloc,new) are described in great detail. These are things that we often do when we write our usual programs, but we seldom find out how they are done inside.
About intermediate code generation, code generation, the Code optimization section of the content is really bad to say. Many of the domestic textbooks to this part will be very simple to talk about the past, students listen to just as understanding, do not know how to use. But this part of the content of things if you want to seriously, open a semester in separate courses are not finished. In the book "Compiling Principles and practices", the explanation for this part is just right. The main explanation of the author is a stack-based instruction code, very easy to understand, so that people look after, it is easier to imitate, they can write their own code generation. Of course, for other code generation techniques, the Code optimization technique is simple to explain. If you want to carefully study code generation technology, in fact, there are also called "Advance Compiler Desgin and Implement", the book is now introduced by the mechanical industry press, very thick, but also the original English. But this book I did not put it as a recommendation book to everyone, after all, can put the content of the Dragon book Clear, in China has been even very good master, to that time to see this "Advance Compiler Desgin and Implement" is not too late. Code optimization part of the undergraduate teaching is still a less important part of the practice process, I believe that we are not used to get. After all, it's good to have your own compiler generate the code correctly, but what about optimizations?
About Practice
After all, the course of compiling principles is only a course for explaining the principles, not a specialized compilation technology course. There is a big difference between the two courses. The compiler technology is more concerned with the actual techniques used in compiling the compiler, while the principle of the course focuses on its basic theory. But computer science itself is a very practical course, if you can apply it, it is called the real learning. Li Yang, when explaining crazy English, said that you can only learn the word or phrase when you actually use a word and a phrase, instead of just knowing its spelling and meaning. In fact, any study is the same, if the lack of a combination of practice, you can not be counted to learn.
The course of compiling principle is to explain the compiler produces theory and principle, then very simple, write a compiler oneself is the best practice process. But you have to be careful. The compilation system may be one of the most complex systems in all software systems, or why does the university also have to write the compiler into a course called the compiling principle? I admire those who have learned the operating system principle began to write their own operating system, learned the compiler principle began to write their own compiler people, indeed, in China, dare to do so few students. And regardless of whether you do this can be successful, at least with this attempt, will let your programming, system planning arrangement of the foundation to improve a lot. I'm going to give you some of the difficulties you might encounter in the course of your practice, hoping to help you out before you get into trouble.
1. Lex and YACC. These two tools are a very grammatical analysis tool for lexical analysis. If you write a compiler yourself, I very much do not recommend you to the conjunction method analysis of such things are written by hand. Lex and YACC should be an essential part of every textbook for compiling the principles, but there is little to see in the domestic textbook. These two tools are small things under the UNIX system, and if you're going to use it in Windows, then you'd better go under Cygwin this software. It is a simulation of UNIX under Windows, which contains the two tools Flex.exe and Bison.exe (YACC). These two tools are cumbersome to use (in fact, many of the most useful tools under Unix), but in the principles and practices of compiling The two tools are explained in detail in this book, as well as a number of practical examples.
2. Interpreting a language is simpler than making a compiler that generates machine code. Although it is said that an interpreter compiler, like Java, you have to write your own interpreter, but you do not have to find the machine code information. If you do build the final machine code the compiler may run into problems and there is a register-based code generation method. As I said earlier, if you generate stack-based code, the code generation process is very simple and there are not many things to consider, and if you consider the final machine code generation, you have to consider the problem of how the machine's registers are allocated.
3. Consider using the grammar files that others have generated, and try not to write the lexical and grammar files yourself. A friend once said that writing a good syntax definition for a programming language almost half of a compiler. That's true, Writing a grammar file is a hard thing to do. Now it can be found all over the internet, such as C language, C++,java, Tiny C,minus C and other languages such as lexical files and grammar files, you can completely down to use.
In the book compiling principles and practices, the author gives a full code of tiny C. I feel that the author of this compiler is doing very well, compared to other Php,perl and other languages of the source code, much simpler, easy to read, And it clearly shows the implementation of a completed compilation system. Its source code can be downloaded on the author's website
Compiling principle Book Recommendation