Syntax analysis of source code of MYC compiler

Source: Internet
Author: User
Tags emit

The MYC compiler uses a top-down approach for parsing syntax, which is usually done from the leftmost token, and then from top to bottom, which syntax rule may contain the token, and if it contains this token, it will match the following token from left to right according to the syntax rule. Top-down syntax parsing I'll explain in other articles that we've listed the syntax rules for MYC in the previous article:

Program:: = (outer_decl | func_decl); OUTER_DECL:: = [Class] Type ident {"," ident} ";"; FUNC_DECL:: = [class] Type ident "(" params ")" Outer_block;outer_block:: = "{" {inner_decl} {stmt} "}"; Inner_decl: : = [Class] Type ident {"," ident} ";"; Inner_block:: = "{" {stmt} "}"; Ident:: = name | Function_call;function_call:: = Name "(" [Expr {, expr}] ")";p arams:: = Type ident {, type ident};stmt:: = (if_stmt| While_stmt| For_stmt| Break_stmt| Cont_stmt| Ret_stmt| ASSIGN_STMT); if_stmt:: = "if" ("Expr") "Stmt_block [" Else "inner_block];while_stmt:: =" while "(" Expr ")" Inner_blo CK;FOR_STMT:: = "for" "(" Assign ";" Expr "," assign ")" inner_blockbreak_stmt:: = "Break" ";"; CONT_STMT:: = "Continue" ";"; RET_STMT:: = "return" expr ";"; ASSIGN_STMT:: = assign ";"; assign = ident "=" Expr;class:: = "extern" | "Static" | "Auto"; Type:: = "int" | "void"; Factor:: = (Ident | integer | "(" expr ")"); Unary_factor:: = ["+" | " -"] Factor;term1:: = [" * "|" /"] factor;term0:: = factor { Term1};first_term:: = Unary_factor term1;math_expr:: = first_term {["+" | " -"] term0}rel_expr:: = math_expr (" = = "|"!) ="|" < "|" > "|" >= "|" <= ") Math_expr;not_factor:: = ["! "] Rel_expr;term_bool:: = not_factor {(" & "| "&&") not_factor};bool_expr:: = term_bool {("|" | "^") term_bool};expr:: = bool_expr;name:: = letter {Letter | digit};integer:: = digit {digit};letter:: = "a-za-z";d IG it:: = "0-9";

We use a few examples to illustrate the top-down parsing process, such as the definition of the following global variables:

int A;

The top-down process is as follows:

  1. First, the compiler starts from the top grammar rule –program and then parses down;
  2. Program contains two rules, outer_decl and func_decl, because outer_decl is the first rule, so the compiler begins to taste whether the current keyword matches the outer_decl grammar rules;
    OUTER_DECL:: = [Class] Type ident {"," ident} ";";

  3. The class, type, and ident that make up the outer_decl rule are grammatical rules, not word rules-that is, token, because in MYC's syntax, we can find the syntax definitions for class, type, and ident. Where class is surrounded by square brackets to indicate that it is optional.

  4. The compiler tries to right again to see if the current token matches the type, and finds that the first word int matches the type's syntax rule successfully:
    Type:: = "int" | "Void";

      

  5. In this step, the compiler successfully digests the first Token–int, and then continues to the right in Outer_decl to match the next token,outer_decl rule in the next requirement is ident. So the compiler determines that the next token is not a match ident:
    Ident:: = name | Function_call;name:: = letter {Letter | digit};letter:: = "A-za-z";d igit:: = "0-9";

  6. The next token given by the lexical analyzer is a, matching the name rule in ident (the path down is: Ident, name---letter).
  7. The compiler continues to digest tokens to the right, at which point there is a semicolon in the source code '; ' Have not yet been matched, and the following are the rules in OUTER_DECL:

    {"," ident} ";";

      

  8. However, in Outer_decl, the rules in curly braces are 0 to more options, so the last semicolon in the source code matches the last semicolon in the outer_decl, and the compiler then completes the analysis of a C statement and learns that the statement is a definition of a global variable.

Let's take another example to demonstrate how the compiler handles function definitions:

void Main () {  int c;  int D;}

  

    1. As in the previous parsing of global variables, the compiler resolves from top to bottom, first trying to outer_decl this syntax rule, and from left to right according to [class] type ident these rules to digest the two tokens of void main, but in processing parentheses ' (' The compiler is in trouble and there are no rules for processing parentheses in outer_decl.
    2. The compiler had to backtrack to the left, put the processed void and main two tokens back into the token stream, and try the second rule of program func_decl. Note: This is said to go back to the left, in the human-written parser rarely do this-because the efficiency is too low, you will see in the following description of the specific source of the Myc compiler is how to deal with this situation.
      FUNC_DECL:: = [class] Type ident "(" params ")" Outer_block;

        

    3. Similar to the process of handling outer_decl, the compiler handles void and main two tokens from left to right, while continuing to the right, when the source token– ' (' with func_decl next rule ' ("is matched, so the compiler can continue from top to bottom, From left to right according to the rules of Func_decl Digest the remaining tokens in the source code.

    4. It is interesting to have a rule in func_decl, the definition of inner_decl and the definition of outer_decl look similar, but it is this distinction that allows the compiler to recognize it correctly, with an int A; Such a statement is either a global variable definition or a local variable definition within a function-because inner_decl can only be parsed recursively from FUNC_DECL.

The rest of the sentence parsing part, I will not be here to say, please interested in the Netizen himself to find a few C statements on the above grammar from top to bottom.

Below we begin to analyze MYC syntax parsing source, these functions are completed by the parse class. The constructor of the parse class accepts two parameters: Io objects and Tok objects, where IO objects are primarily two useful, one is to record the current source location during parsing, and the other is to output some error messages, so here we don't have any space to describe the IO object.

Public Parse (IO I, Tok t) {  io = i;  tok = t;  Initialize static variable list  Staticvar = new Varlist ();}

  

After the parse object is created, the actual parsing process is handled by the program function – called in the main function of Myc.cs, and you should be able to see that the function name is coincident with the previous syntax rule name, such as program, OUTERDECL, etc. Because the function of syntax parsing is similar, here I pick a few key function description under:

public void Program () {  //Prepare the module information to be generated  prolog ();  Cycle consumes the token stream while (tok) in the source code  . Noteof ())    {  //Although the name is OUTERDECL, it actually contains two grammatical rules, namely  outer_decl and Func_decl outerdecl () in program/rule    ;    }  //Error handling  if (Io.genexe &&!mainseen)    Io. Abort ("Generating executable with no main entrypoint");  End Code Generation  Epilog ();}

  

Myc's syntax is simple, so the syntax parsing and code generation are done in the program function. Since c is not a concept of class, and. NET IL is an object-oriented intermediate language, the Prolog function is called at the beginning of the program function, on the one hand, to generate a default object for the C programs being compiled, on the other hand, because. NET executable assembly can actually be composed of multiple module –module, so a default module is also generated in the Prolog function. Here is the source code for the Prolog function:

void Prolog () {  //Create code generation object  emit = new emit (IO);  Prepare the. NET module emit that eventually contains the C program  . Beginmodule ();//need assembly module  //generates a default class  emit. Beginclass ();}

  

The OUTERDECL function is responsible for handling the program's two grammar rules outer_decl and FUNC_DECL:

void Outerdecl () {///saves information about variables in the currently resolved C statement, such as variable name//function name, variable type, etc. var e = new Var (); #if DEBUG Console.WriteLine ("Outerdecl Tok  en=["+tok+"]\n "); #endif//Record the current source location in order to save the location information in the resulting IL file (if you want to generate IL)///in Commentholder ();/* mark the position in INSN stream  The [class] rule dataclass (e) That is common to the OUTER_DECL and func_decl rules;  Deal with the type rules common to outer_decl and FUNC_DECL rules DataType (e);  Determine if the next character is an opening parenthesis, and if so, then press//FUNC_DECL rule to process if (io.getnextchar () = = ' (') Declfunc (e); Otherwise, the OUTER_DECL rule processes else declouter (e);} Parse the remainder of the outer_decl syntax rule void Declouter (Var e) {#if DEBUG Console.WriteLine ("DeclOuter1 token=[" +tok+ "]\n"), #endif//Front In the OUTERDECL function, the [class] and type rules have been processed//So the first token current token flow ident, that is, the variable name//Here the variable name is assigned to e-the variable name created by Outerdecl E.setname (t  Ok.getvalue ());/* Use value as the variable name *//Save this variable in the list of global variables for later semantic analysis using Addouter (e);/* Add this variable */ Create a variable declaration node emit in the result syntax tree. FieldDef (e);/* Issue the declaration *///If the current statement has a section that matches the [class] rule, that is, the statement is preceded by a static,//extern keyword thatInformation is also saved to the variable declaration node, so that the code is generated//followed by an if (e.getclassid () = = Tok.t_defclass) e.setclassid (tok.t_static);/* Make sure it know s its storage class *///processing OUTER_DECL rules {"," ident} This multivariable declaration part//is processed similar to: int A, b, C; Such a variable declaration statement//This process is done by judging whether the token behind is ', ' to complete the/* loop While there is additional variable names */while (io.getnext    Char () = = ', ') {//followed by ', ', then first digest the character Tok.scan (); if (Tok.getfirstchar ()! = ', ') io.    Abort ("Expected ', '");    Scan right Tok.scan (); Try to find a token if (Tok.getid ()! = tok.t_ident) io that matches the ident rule.    Abort ("expected identifier"); Find a variable name-the token e.setname (Tok.getvalue ()) that matches the ident rule; /* Use value as the variable name *//Add this new global variable to the global variable table addouter (e);/* Add this variable Variable declaration node emit. FieldDef (e);/* Issue the declaration * * IF (e.getclassid () = = Tok.t_defclass) e.setclassid (tok.t_static);/* Make S  Ure it knows its storage class */}//digest the previous variable definition and see if the character followed is a semicolon-'; ' /* * move beyond END of Statement indicator */Tok.scan ();    if (Tok.getfirstchar ()! = '; ') Io.  Abort ("expected '; '");  Smooth parsing of a statement, finishing processing commentfill (); Tok.scan (); #if DEBUG Console.WriteLine ("DeclOuter2 token=[" +tok+ "]\n"); #endif}

  

Syntax analysis of source code of MYC compiler

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.