My first compiler parser

Source: Internet
Author: User
Tags lexer

Java-written compilers and JVMs

Why the structure of Java,java is most easily understood, and its rich design pattern makes the compiler structure very distinct

A compiler's front-end model

Source Code – Lexical Analyzer-(lexical unit)-parser-(parsing tree)-Intermediate code generator – three address codes

Plus a symbol table to connect all the structures

Grammar definitions

A context-independent method consists of four elements
1. A set of finalization symbols, the "lexical unit" Terminator, is a set of basic symbols for the language defined by the grammar.
2. A non-terminating symbol set, "syntax variable" each non-terminating symbol represents a collection of end symbol strings
3. A production set, where each of the production consists of a non-terminating symbol called the left or left side of the production, an arrow, and a sequence of terminating and non-terminating symbols called the resulting or right part.
4. Specify a start symbol for a non-terminating symbol

A lexical unit consists of two parts: a name and a property value.

Syntax Analysis tree

1. The label of the root node is the starting symbol of the grammar
2. The label of each leaf node is a terminating symbol or ε
3. The label of each internal node is a non-terminating symbol
4. If the non-finalization symbol A is a symbol of an internal node, and its child nodes are labeled from left to right respectively x1,x2,x3: Then there must be a production-type a→x1,x2. Xn, of which x1,x2,.. Xn can be either a terminating symbol or a non-terminating symbol, as a special case, if a→ε is a production, then a node labeled a can have only one sub-node labeled ε.

Then the basic concept is finished, directly on the code, because the code is more intuitive

Lexical analysis

First say a code class, why have it, I was to better test. At the same time, you can also directly assign the code to it, as long as you can do one of the characters can be read, students can find another way.

publicclass Code {    publicstatic String content="int a=9;";    privatestaticint index=0;    publicstaticcharread(){        return content.charAt(index++);    }}
Reject Blanks and annotations
for (;; peek = (char) Code.read()) {            if‘ ‘‘\t‘)                continue;            elseif‘\n‘)                line++;            else                break;        }
Pre-read

For example, we need to pre-read a character after 1 to differentiate between 1 and 10,t pre-read one to differentiate T and true, certainly like * this is not required for pre-read

Constant
if (Character.isDigit(peek)) {            int0;            do {                1010);                peek = (char) Code.read();            while (Character.isDigit(peek));            returnnew Num(v);        }

We're going to go into sequence when we enter 1+2+3.

<num,1><+><num,2><+><num,3>

+ is the end symbol, no attribute, so its tuple is <+>

Identify keywords and identifiers

For do if
If you enter
A=a+b
The end symbol is id=id+id.

  <id,"a"><+><id,"a"><+><id,"b">

The keyword is not used for identifiers, so we want to identify the keywords and identifiers
This uses a symbol table, we use a hashtable to store the keyword

privatenew Hashtable();    @SuppressWarnings"unchecked" })    privatevoidreserve(Word t) {        words.put(t.lexeme, t);    }    publicLexer() {        reserve(new"true"));        reserve(new"false"));    }
    if (Character.isLetter(peek)) {            new StringBuffer();            do {                b.append(peek);                peek = (char) Code.read();            while (Character.isLetterOrDigit(peek));            String s = b.toString();            Word w = (Word) words.get(s);            ifnull)                return w;            new Word(Tag.ID, s);            words.put(s, w);            return w;        }
Lexical analyzer
package  com.bigbear.lexer;< Span class= "Hljs-keyword" >public  class  tag  { public  final< /span> static  int  num= 256 , Id=257 , True= 258 , False=259 ;}  
package  com.bigbear.lexer;< Span class= "Hljs-keyword" >public  class  token  { public      Final  int  tag; public  token  (int  tag)    {this . Tag = tag; }}
package  com.bigbear.lexer;< Span class= "Hljs-keyword" >public  class  num  extends  token  {  public  final     int  value; public  num  (int  v) {//TODO auto-generated constructor stub  super         (Tag.num);    this . Value=v; }}
package com.bigbear.lexer;publicclass Word extends Token {    publicfinal String lexeme;    publicWord(int t,String s) {        super(t);        this.lexeme=new String(s);    }}
 PackageCom.bigbear.lexer;ImportJava.io.IOException;Importjava.util.Hashtable;ImportCom.bigbear.main.Code;/** * @author Winney Lexical Analyzer * */ Public  class Lexer {     Public intline =1;Private CharPeek ="';@SuppressWarnings("Rawtypes")PrivateHashtable words =NewHashtable ();@SuppressWarnings({"Unchecked"})Private void  Reserve(Word t)    {Words.put (t.lexeme, T); } Public Lexer() {Reserve (NewWord (Tag.true,"true")); ReserveNewWord (Tag.false,"false")); }@SuppressWarnings("Unchecked") PublicTokenScan()throwsIOException { for(;; peek = (Char) Code.read ()) {if(Peek = ="'|| Peek = =' \ t ')Continue;Else if(Peek = =' \ n ') line++;Else                 Break; }if(Character.isdigit (Peek)) {intv =0; do {v =Ten* v + character.digit (Peek,Ten); Peek = (Char) Code.read (); } while(Character.isdigit (Peek));return NewNum (v); }if(Character.isletter (Peek)) {StringBuffer b =NewStringBuffer ();                Do {b.append (peek); Peek = (Char) Code.read (); } while(Character.isletterordigit (Peek));            String s = b.tostring (); Word w = (word) words.get (s);if(W! =NULL)returnW W =NewWord (tag.id, s); Words.put (S, W);returnW } Token t =NewToken (Peek); Peek ="';returnT } Public Static void Main(string[] args) {Try{Lexer LX =NewLexer (); for(inti =0; I <5;            i++) {System.out.println (Lx.scan (). tag); }        }Catch(IOException e) {//TODO auto-generated catch blockE.printstacktrace (); }    }}

My first compiler parser

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.