The principle of compiling--lexical analysis program

Source: Internet
Author: User
Tags bool comments reserved

This is my study of compiling principles, the contents of the course experiment, the course has already finished, is now collated and published. first, the experimental task read the compiler's classic lexical analysis source program, in C or Java language to write a language of the lexical analyzer. Ii. contents of the experiment

Read the classic lexical analysis source program that already has a compiler.
Select a compiler, such as: Tiny or pl/0, and other compilers (need to bring your own source code). Reading lexical analysis source program, Understanding lexical Analysis program Construction Method-state diagram code. In particular, it requires a slightly detailed description of the function and function of the relevant functions and important variables. It is better to add learning experience.

According to the language keywords and recognition of lexical units and comments, etc., determine the keyword table, draw all lexical units and comments corresponding to the DFA diagram.

The lexical parser for the selected language is written in the previous lexical parser.

Prepare two or three test cases, requiring both positive and inverse examples to test the results of the compilation. three, lexical analysis

Tiny Lexical parsing Tiny's mark.

It can be seen that the tiny language is rarely marked, after all, simply a language. But at the beginning to think--how these, and how to judge, I was very confused at the beginning, I remember that I was also spent an afternoon to see the tiny of the lexical analysis of the source and experiment to the document, only to understand how to write, and then back to the dorm one night on the basis of tiny rewrite finished. < ( ̄︶ ̄) >

Simply put, a notation is a classification of a language, and when you read a string, you have to identify what the string is, what type it is, so you read a certain rule to play a certain character, you should judge what the character is, whether it is wrong, how to classify, and the mark is the standard you use to classify. State Transitions

Then, given the type, how do you judge the type? Convert the diagram with DFA.

This conversion is as simple as start state, innum number, INID string,+-*/=< (); Special symbol, done state, from the start state according to read the first character again , according to read the character conversion state, enter a state, when the next character read does not conform to the current state of the type, do not read the character, and read the end of the string, when it is a type. After judging the string, repeat the operation again until all characters are read.

When the state of need to judge is added, it becomes the lexical analysis conversion diagram of tiny.

Iv. Programming

I write the lexical analysis is actually in the tiny based on a little modification, lexical design This part is roughly the same.

Mark

Reserved words:
cin While then cout end
special characters:
= +-*/(); >> <<
Comment:
{This is a comment}

Conversion Table
Refer to above

Code resolution macro definition maximum match character variable length is 40
reserved word for 5 characters

#define Maxtokenlen
#define MAXRESERVED 5
Defines an enumeration type used to represent the set of States of the DFA
typedef enum {//DFA state set
    START,
    incomment,
    innum,
    INID,
    inin,
    INOUT,
done } StateType;
Defines an enumeration type that represents the string type to match against the input string.
typedef enum {//For matching type, judging input/
    * Exception status */
    Endfile,
    ERROR,/
    * reserved word */
    CIN,
    COUT, while
    ,
    Then,
    END,/
    * DFA status */
    ID,
    NUM,/
    * Special symbol */
    in, out
    ,
    EQ,
    PLUS,
    Minus, Times
    , over
    ,
    Lparen,
    rparen,
    SEMI
} tokentype;
Defines a struct that is used to output reserved words based on the reserved word to which it is matched.
static struct//reserved word struct for output
{
    const char *str;
    Tokentype Tok;
} Reservedwords[maxreserved] = {{
    "cin", cin},
    {"While", while}, {' Then
    ', then},
    {"cout", cout},
    {"End", End}};
Defines a function that returns a type bool that determines whether the input character is a letter.
BOOL Isletter (char c)//is the character
{
    if (c >= ' a ' && c <= ' z ') | | (c >= ' A ' && C <= ' Z '))
    {
        return true;
    }
    else
    {
        return false;
    }
}
Determines whether the matched string is a reserved word.
Static Tokentype Reservedlookup (char *s)
{for
    (int i = 0; i < maxreserved; i++)
        if (!strcmp (S, reservedwor DS[I].STR))
            return reservedwords[i].tok;
    return ID;
}
The string is output based on the type of character matched to.
void Printtoken (Tokentype token, const char tokenstring[]); Output function
A function that matches a string.
void GetToken (string ss); Lexical analysis
Five, experimental testInput
{SDFs
ADF}
cin  >>{SDFSADF} x;
{SDFSADF}
cin>>y;
while (cin>>z) THEN{SDFSADF}
x=x z y;
cout << x;
End
Output
1:{sdfs
2:ADF}
3:cin  >>{SDFSADF} x;
        3:reserved word:cin
        3: >>
        3:id, Name= x
        3:;
4:{SDFSADF}
5:cin>>y;
        5:reserved word:cin
        5: >>
        5:id, name= y
        5:;
6:while (cin>>z) THEN{SDFSADF}
        6:reserved word:while
        6: (
        6:reserved word:cin
        6: >>< C18/>6:id, Name= z
        6:)
        6:reserved word:then
7:x=x z y;
        7:id, Name= x
        7: =
        7:id, name= x
        7:id, name= z
        7:id, name= y
        7:;
8:cout << x;
        8:reserved word:cout
        8: <<
        8:id, name= x
        8:;
9:end;
        9:reserved word:end
        9:;
Vi. Summary of the experiment

Actually just wrote this blog when I am a little bit empty, because do not remember tiny of grammar analysis of what is going on, can only bite the bullet to see the experimental document, and then slowly back up. And then you see the blog.

I really spent the afternoon 2, 3 hours to see the experimental documents, look at the experimental documents, see tiny source code, slowly understand what is, and then plan to change to what kind of, experimental content that part I have to revise some of the content requirements are we determined our language, Encourage us to define a language, and I was trying to write the language of math, and that was the input.

Then in fact, not good to write, especially to write the whole front-end situation, more and more powerless, I later the experiment to understand this point, and then did not insist to write their own language, in fact, no one of the students write their own language of the entire front end, not tiny is the PL change.

Also, I am also here into the enumeration and structure of the pit, and then there are times the experiment code crazy with the structure of the structure is very complex, in short, if you still look at the code behind my experiment, you will know what is called insanity ... vii. download of information

Specific code See Mathlex

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.