How was the first C language compiler written?

Source: Internet
Author: User
Tags volatile

first to C- language Father Dennis macalistair Ritchie salute!

  almost all of today's practical compilers/The Interpreter (collectively, the compiler) is usedClanguage, there are some languages such asClojure,jythonetc is based onJvmor use it.Javaachieved,IronPythonetc is based on. NETimplemented, butJavaand theC#and so on itself to rely onC + +is an indirect call to theC. So measuring the portability of a high-level language is actually talking aboutAnsi/iso Cportability.

  Clanguage is very low-level language, many aspects are similar to assembly language, in theIntel32bit assembly Language program design, and even introduced the Manual of the simpleCa method of translating a language into a compendium. For the compiler, this system software usesClanguage to write is very natural, even if it is likePythonthis high-level language still relies on the underlyingCLanguage (LiftPythonis an example of a hacker trying to getPythonit doesn't require an operating system--it actually eliminates the need toBioson the disposableCcode). Now the students, after learning the compiling principle, as long as a little programming ability can achieve a simple function of the classClanguage compilers.

But the question came, do not know whether you have thought, everybody uses C language or C language to write the compiler, then the world's first C language compiler How to write it? This is not a "chicken and egg" problem ...

Let's take a look back .CLanguage History:1970yearsTomphsonand theRitchiein theBcpl(an interpretive language) developed on the basis ofBlanguage,1973years again inBon the basis of language, we have successfully developed theClanguage. In theCbefore the language is used as a system programming language,TomphsonI used it.Blanguage to write the operating system. Visible inCbefore the language was implemented,Blanguage has been put into practical use. So the first oneCThe language compiler's prototype is entirely possible withBlanguage or MixBLanguage andPdpwritten in assembly language. We all know now,Blanguage efficiency is relatively low, but if all in assembly language to write, not only the development cycle is long, maintenance difficult, more frightening is the loss of advanced programming language necessary portability. So the earlyCThe language compiler has taken a trickery approach: first write in assembly language a A subset of the C language compiler, and then through this subset to recursive completion of the complete C language compiler. the detailed procedure is as follows:

First of all, create a subset of the basic functions of the C language, as C0 language, C0 language is simple enough, you can directly use assembly language to write the C0 compiler. Rely on C0 already have the function, design than C0 complex, but still incomplete C language another subset of C1 language, wherein C0 belongs to C1,c1 belongs to C, with C0 developed C1 language compiler. On the basis of C1 design a subset of C language C2 language, C2 language than C1 complex, but still not the full C language, the development of C2 language compiler ... So until CN,CN is strong enough, this time it is enough to develop the complete C language compiler implementation. As for the number of n here, it depends on the complexity of your target language (here is C) and the programmer's ability to program – Simply put, if you are at a subset stage and you can easily implement C with existing features, you'll find N. The following illustration illustrates this abstract process:

  

C language
CN language
......
C0 language
Assembly
Machine language

Is this picture a little familiar? Yes, I saw it in the virtual machine, but this is CVM (c Language virtual machines), each of which can be compiled independently on each virtual layer, and in addition to the C language, The output of each layer will be the input of the next layer (the output of the last layer is the application), and Snowball is a reason. By hand (assembly language) a small handful of snow together, a little roll down on the formation of a snow ball, which is probably called the 0 birth of the c,c birth of all things it?

So how does this bold subset simplification approach be implemented, and what is the theoretical basis? Introduce a concept, "self-compiling" (self-compile), that is, for some strong type with obvious bootstrap properties (so-called strong type is the program of each variable must learn to declare the type to be used, such as C, in contrast, some scripting language does not have the type of the statement) programming language, It is possible to use a finite subset of them to express themselves through a finite number of recursion, such as C, Pascal, Ada and so on, as to why you can self-compile, you can see the Tsinghua University Press, "compiling principle", the book implemented a Pascal subset of the compiler. In short, there has been a CS scientists have proved that C language in theory can be achieved through the above-mentioned CVM method to achieve the complete compiler, then actually how to simplify it?

Here are the keywords for C99:

AutoenumRestrict unsigned Break       extern      return          void Case        float        Short           volatileChar         forSigned whileConst       Goto        sizeof_boolContinue    if          Static_complexdefaultInlinestruct_imaginary Do          int         Switch        Double      LongtypedefElseRegister Union//a total of 37

Look closely, in fact, many of the keywords are to help the compiler to optimize, and some are used to limit the variables, function scope, link or life cycle (function not), these in the early implementation of the compiler does not have to add, so you can remove Auto,restrict,extern, Volatile,const,sizeof,static,inline,register,typedef, this forms a subset of C, C3 language, the C3 language keywords are as follows:

enumunsigned Break       return      void Case        float        Short   Char         forSigned whileGoto_boolContinue    if_complexdefault     struct_imaginary Do          int         Switch        Double      Long    ElseUnion//a total of 27

Think again, found that there are many types of C3 and type modifiers is not necessary to add one at a time, such as three types of integers, as long as the implementation of int is OK, so further remove these keywords, they are: Unsigned,float,short,char (char is int), Signed,_bool,_complex,_imaginary,long, this forms our C2 language, the C2 language keyword is as follows:

 enum  break  return  void  case  for  while  goto  continue  if  default  struct  do  int  switch  double  else   Union  //  total 18  

Continue to think, even if there are only 18 keywords in the C2 language, there are still many, advanced places, such as based on the basic data type of composite data structure, in addition to our key table is not a write operator, in the C language of the compound assignment operator, such as the operator, such as ++,-- Such too flexible expression can also be completely removed at this time, so you can remove the keywords are: enum,struct,union, so we can get C1 language keyword:

 Break      return      void  Case  for          while Goto        Continue    if         default   do          int         Switch        Double  Else // a total of 15

Close to perfection, but the last step is naturally a little bit bigger. This time the array and pointers are also removed, and C1 language in fact there is still a great degree of complexity, such as control loops and branches have a variety of representations, in fact, can be simplified into one, specifically, the loop statement has a while loop, Do...while Loop and for Loop, It's enough to keep the while loop, and the branch statement has the If ... {},if ... {}...else,if ... {}...else If...,switch, these four forms, they all can pass more than two if ... {}, so you only need to keep if,... {} is enough. But again, the so-called Branch and Loop is just a conditional jump statement, function call statement is just a pressure stack and jump statement, so only Goto (unrestricted goto). So boldly remove all the structured keywords, even the function is not, get the C0 language keyword is as follows:

 Break    void Goto        int     Double  // a total of 5

This is the ultimate in simplicity.

Only 5 keywords, can be fully implemented in assembly language quickly. Through the reverse analysis we restored the first C language compiler writing process, but also felt the wisdom and diligence of the predecessors of the scientists! We're all just dust on giants ' shoulders! 0 Born to C,c born all things, really ingenious!

How was the first C language compiler written?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.