This is my study of compiling principles, the contents of the course experiment, the course has already finished, is now collated and published. first, the experimental task read the compiler's classic lexical analysis source program, in C or Java language to write a language of the lexical analyzer. Ii. contents of the experiment
Read the classic lexical analysis source program that already has a compiler.
Select a compiler, such as: Tiny or pl/0, and other compilers (need to bring your own source code). Reading lexical analysis source program, Understanding lexical Analysis program Construction Method-state diagram code. In particular, it requires a slightly detailed description of the function and function of the relevant functions and important variables. It is better to add learning experience.
According to the language keywords and recognition of lexical units and comments, etc., determine the keyword table, draw all lexical units and comments corresponding to the DFA diagram.
The lexical parser for the selected language is written in the previous lexical parser.
Prepare two or three test cases, requiring both positive and inverse examples to test the results of the compilation. three, lexical analysis
Tiny Lexical parsing Tiny's mark.
It can be seen that the tiny language is rarely marked, after all, simply a language. But at the beginning to think--how these, and how to judge, I was very confused at the beginning, I remember that I was also spent an afternoon to see the tiny of the lexical analysis of the source and experiment to the document, only to understand how to write, and then back to the dorm one night on the basis of tiny rewrite finished. < ( ̄︶ ̄) >
Simply put, a notation is a classification of a language, and when you read a string, you have to identify what the string is, what type it is, so you read a certain rule to play a certain character, you should judge what the character is, whether it is wrong, how to classify, and the mark is the standard you use to classify. State Transitions
Then, given the type, how do you judge the type? Convert the diagram with DFA.
This conversion is as simple as start state, innum number, INID string,+-*/=< (); Special symbol, done state, from the start state according to read the first character again , according to read the character conversion state, enter a state, when the next character read does not conform to the current state of the type, do not read the character, and read the end of the string, when it is a type. After judging the string, repeat the operation again until all characters are read.
When the state of need to judge is added, it becomes the lexical analysis conversion diagram of tiny.
Iv. Programming
I write the lexical analysis is actually in the tiny based on a little modification, lexical design This part is roughly the same.
Mark
Reserved words:
cin While then cout end
special characters:
= +-*/(); >> <<
Comment:
{This is a comment}
Conversion Table
Refer to above
Code resolution macro definition maximum match character variable length is 40
reserved word for 5 characters
#define Maxtokenlen
#define MAXRESERVED 5
Defines an enumeration type used to represent the set of States of the DFA
typedef enum {//DFA state set
START,
incomment,
innum,
INID,
inin,
INOUT,
done } StateType;
Defines an enumeration type that represents the string type to match against the input string.
typedef enum {//For matching type, judging input/
* Exception status */
Endfile,
ERROR,/
* reserved word */
CIN,
COUT, while
,
Then,
END,/
* DFA status */
ID,
NUM,/
* Special symbol */
in, out
,
EQ,
PLUS,
Minus, Times
, over
,
Lparen,
rparen,
SEMI
} tokentype;
Defines a struct that is used to output reserved words based on the reserved word to which it is matched.
static struct//reserved word struct for output
{
const char *str;
Tokentype Tok;
} Reservedwords[maxreserved] = {{
"cin", cin},
{"While", while}, {' Then
', then},
{"cout", cout},
{"End", End}};
Defines a function that returns a type bool that determines whether the input character is a letter.
BOOL Isletter (char c)//is the character
{
if (c >= ' a ' && c <= ' z ') | | (c >= ' A ' && C <= ' Z '))
{
return true;
}
else
{
return false;
}
}
Determines whether the matched string is a reserved word.
Static Tokentype Reservedlookup (char *s)
{for
(int i = 0; i < maxreserved; i++)
if (!strcmp (S, reservedwor DS[I].STR))
return reservedwords[i].tok;
return ID;
}
The string is output based on the type of character matched to.
void Printtoken (Tokentype token, const char tokenstring[]); Output function
A function that matches a string.
void GetToken (string ss); Lexical analysis
Five, experimental testInput
{SDFs
ADF}
cin >>{SDFSADF} x;
{SDFSADF}
cin>>y;
while (cin>>z) THEN{SDFSADF}
x=x z y;
cout << x;
End
Output
1:{sdfs
2:ADF}
3:cin >>{SDFSADF} x;
3:reserved word:cin
3: >>
3:id, Name= x
3:;
4:{SDFSADF}
5:cin>>y;
5:reserved word:cin
5: >>
5:id, name= y
5:;
6:while (cin>>z) THEN{SDFSADF}
6:reserved word:while
6: (
6:reserved word:cin
6: >>< C18/>6:id, Name= z
6:)
6:reserved word:then
7:x=x z y;
7:id, Name= x
7: =
7:id, name= x
7:id, name= z
7:id, name= y
7:;
8:cout << x;
8:reserved word:cout
8: <<
8:id, name= x
8:;
9:end;
9:reserved word:end
9:;
Vi. Summary of the experiment
Actually just wrote this blog when I am a little bit empty, because do not remember tiny of grammar analysis of what is going on, can only bite the bullet to see the experimental document, and then slowly back up. And then you see the blog.
I really spent the afternoon 2, 3 hours to see the experimental documents, look at the experimental documents, see tiny source code, slowly understand what is, and then plan to change to what kind of, experimental content that part I have to revise some of the content requirements are we determined our language, Encourage us to define a language, and I was trying to write the language of math, and that was the input.
Then in fact, not good to write, especially to write the whole front-end situation, more and more powerless, I later the experiment to understand this point, and then did not insist to write their own language, in fact, no one of the students write their own language of the entire front end, not tiny is the PL change.
Also, I am also here into the enumeration and structure of the pit, and then there are times the experiment code crazy with the structure of the structure is very complex, in short, if you still look at the code behind my experiment, you will know what is called insanity ... vii. download of information
Specific code See Mathlex