Write the lexical analysis of the TINY compiler by yourself.
TINY is a teaching programming language introduced in the Book Compilation principles and practices. It lacks the main features of a real programming language, but it is enough to illustrate the main features of the compiler. This article will introduce the implementation process of the compiler. The complete implementation code is in loucomp_linux for beginners to refer to the compilation principles.
Test knife:
After downloading the source code, go to loucomp_linux and enter
$ Make
Generate the tiny program, and enter
$ Tiny sample. tny
Tiny generates the tm command from TINY source code in sample. tny. The tm command is the assembly code of the TM virtual machine. The source code of the TM virtual machine is in tm. c. Enter the following command for compilation:
$ Gcc tm. c-o tm
With tm, You can execute the sample. tm command generated above:
$ Tm sample. tm
The command is loaded into the tm assembly, and then you can run the TM simulation program interactively.
Sample. tny is the factorial code written in TINY. Run the following command to get the factorial of 7:
$ Tm sample. tm
TM simulation (enter h for help )...
Enter command: go
Enter value for IN instruction: 7
OUT instruction prints: 5040
HALT: 0, 0
Halted
Enter command: quit
Simulation done.
The following documents describe the implementation of the compiler and Virtual Machine step by step.
TINY Language
1. TINY language has no process, no declaration, and all variables are shaping.
2. It has only two control statements: if statement and repeat statement. The if statement has an optional else part and must end with the keyword end.
3. read and write to complete Input and Output
4. The statements in "{" and "}" are comments, but the comments cannot be nested.
Program list 1 is a programming example of factorial in this language.
Program list 1
{ Sample program in TINY language - computes factorial}read x; { input an integer }if 0 < x then { don't compute if x <= 0 } fact := 1; repeat fact := fact * x; x := x - 1 until x = 0; write fact { output factorial of x }end
Development Environment and tools:
The development environment in this article is Ubuntu, which uses lex for lexical analysis, yacc for syntax analysis, and gcc for compiler.
Lexical Analysis
1. Keywords: if, then, else, end, repeat, until, read, write.
All keywords are reserved words and all lowercase words.
2. special symbols: +-*/= <();: =
Only <, not used>,: = is the value assignment symbol.
3. Other tags are ID and NUM, which are defined using the following regular expression:
ID = letter +
NUM = digit +
Letter = [a-zA-Z]
Digit = [0-9]
There is a difference between uppercase and lowercase letters.
4. spaces are composed of spaces, tabs, and new rows. It is usually ignored, except that it must separate the ID and NUM keywords
5. Annotations are enclosed by {...} and cannot be nested.
DFA
Shows the DFA of the TINY scanner:
Implementation of lexical scanning program:
1. Define mark, globals. h:
typedef enum /* book-keeping tokens */ {ENDFILE,ERROR, /* reserved words */ IF,THEN,ELSE,END,REPEAT,UNTIL,READ,WRITE, /* multicharacter tokens */ ID,NUM, /* special symbols */ ASSIGN,EQ,LT,PLUS,MINUS,TIMES,OVER,LPAREN,RPAREN,SEMI } TokenType;
2. lex lexical analysis code tiny. l
%{#include "globals.h"#include "util.h"#include "scan.h"/* lexeme of identifier or reserved word */char tokenString[MAXTOKENLEN+1];%}%option noyywrapdigit [0-9]number {digit}+letter [a-zA-Z]identifier {letter}+newline \nwhitespace [ \t]+%%"if" {return IF;}"then" {return THEN;}"else" {return ELSE;}"end" {return END;}"repeat" {return REPEAT;}"until" {return UNTIL;}"read" {return READ;}"write" {return WRITE;}":=" {return ASSIGN;}"=" {return EQ;}"<" {return LT;}"+" {return PLUS;}"-" {return MINUS;}"*" {return TIMES;}"/" {return OVER;}"(" {return LPAREN;}")" {return RPAREN;}";" {return SEMI;}{number} {return NUM;}{identifier} {return ID;}{newline} {lineno++;}{whitespace} {/* skip whitespace */}"{" { char c; do { c = input(); if (c == EOF) break; if (c == '\n') lineno++; } while (c != '}'); }. {return ERROR;}%%TokenType getToken(void){ static int firstTime = TRUE; TokenType currentToken; if (firstTime) { firstTime = FALSE; lineno++; yyin = source; yyout = listing; } currentToken = yylex(); strncpy(tokenString,yytext,MAXTOKENLEN); if (TraceScan) { fprintf(listing,"\t%d: ",lineno); printToken(currentToken,tokenString); } return currentToken;}
The rule section defines DFA conversion. The Helper function defines the getToken method. This function calls yylex () to obtain the matched identifier and then copies the corresponding string, finally, print the recognized string and identifier.
Compile and run the lexical analysis program
The Lexical scan part contains the following C files: header files on the left and code files on the right.
Globals. h main. c
Util. h util. c
Scan. h tiny. l
Sample. tny is a factorial function written in the tiny language. This document and subsequent articles use this file as a test file.
The globas. h header file contains the definition of the data type and the global variables used by the compiler. Main. c is the main program of the compiler, allocating and Initializing full-process variables.
Enter the following command:
$ Make
$./Tiny. out sample. tny
Output:
TINY COMPILATION: sample.tny5: reserved word: read5: ID, name= x5: ;6: reserved word: if6: NUM, val= 06: <6: ID, name= x6: reserved word: then7: ID, name= fact7: :=7: NUM, val= 17: ;8: reserved word: repeat9: ID, name= fact9: :=9: ID, name= fact9: *9: ID, name= x9: ;10: ID, name= x10: :=10: ID, name= x10: -10: NUM, val= 111: reserved word: until11: ID, name= x11: =11: NUM, val= 011: ;12: reserved word: write12: ID, name= fact13: reserved word: end14: EOF
All identifiers are identified, and the corresponding values are printed.
With the lexical analysis program, the next article will introduce the syntax analysis of TINY. Syntax analysis is more complex than lexical analysis. I will first review the relevant knowledge and then update the blog.