My first compiler parser

Last Update:2015-04-11 Source: Internet

Author: User

Tags lexer

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Java-written compilers and JVMs

Why the structure of Java,java is most easily understood, and its rich design pattern makes the compiler structure very distinct

A compiler's front-end model

Source Code – Lexical Analyzer-(lexical unit)-parser-(parsing tree)-Intermediate code generator – three address codes

Plus a symbol table to connect all the structures

Grammar definitions

A context-independent method consists of four elements
1. A set of finalization symbols, the "lexical unit" Terminator, is a set of basic symbols for the language defined by the grammar.
2. A non-terminating symbol set, "syntax variable" each non-terminating symbol represents a collection of end symbol strings
3. A production set, where each of the production consists of a non-terminating symbol called the left or left side of the production, an arrow, and a sequence of terminating and non-terminating symbols called the resulting or right part.
4. Specify a start symbol for a non-terminating symbol

A lexical unit consists of two parts: a name and a property value.

Syntax Analysis tree

1. The label of the root node is the starting symbol of the grammar
2. The label of each leaf node is a terminating symbol or ε
3. The label of each internal node is a non-terminating symbol
4. If the non-finalization symbol A is a symbol of an internal node, and its child nodes are labeled from left to right respectively x1,x2,x3: Then there must be a production-type a→x1,x2. Xn, of which x1,x2,.. Xn can be either a terminating symbol or a non-terminating symbol, as a special case, if a→ε is a production, then a node labeled a can have only one sub-node labeled ε.

Then the basic concept is finished, directly on the code, because the code is more intuitive

Lexical analysis

First say a code class, why have it, I was to better test. At the same time, you can also directly assign the code to it, as long as you can do one of the characters can be read, students can find another way.

publicclass Code {    publicstatic String content="int a=9;";    privatestaticint index=0;    publicstaticcharread(){        return content.charAt(index++);    }}

Reject Blanks and annotations

for (;; peek = (char) Code.read()) {            if‘ ‘‘\t‘)                continue;            elseif‘\n‘)                line++;            else                break;        }

Pre-read

For example, we need to pre-read a character after 1 to differentiate between 1 and 10,t pre-read one to differentiate T and true, certainly like * this is not required for pre-read

Constant

if (Character.isDigit(peek)) {            int0;            do {                1010);                peek = (char) Code.read();            while (Character.isDigit(peek));            returnnew Num(v);        }

We're going to go into sequence when we enter 1+2+3.

<num,1><+><num,2><+><num,3>

+ is the end symbol, no attribute, so its tuple is <+>

Identify keywords and identifiers

For do if
If you enter
A=a+b
The end symbol is id=id+id.

  <id,"a"><+><id,"a"><+><id,"b">

The keyword is not used for identifiers, so we want to identify the keywords and identifiers
This uses a symbol table, we use a hashtable to store the keyword

privatenew Hashtable();    @SuppressWarnings"unchecked" })    privatevoidreserve(Word t) {        words.put(t.lexeme, t);    }    publicLexer() {        reserve(new"true"));        reserve(new"false"));    }

    if (Character.isLetter(peek)) {            new StringBuffer();            do {                b.append(peek);                peek = (char) Code.read();            while (Character.isLetterOrDigit(peek));            String s = b.toString();            Word w = (Word) words.get(s);            ifnull)                return w;            new Word(Tag.ID, s);            words.put(s, w);            return w;        }

Lexical analyzer

package  com.bigbear.lexer;< Span class= "Hljs-keyword" >public  class  tag  { public  final< /span> static  int  num= 256 , Id=257 , True= 258 , False=259 ;}

package  com.bigbear.lexer;< Span class= "Hljs-keyword" >public  class  token  { public      Final  int  tag; public  token  (int  tag)    {this . Tag = tag; }}

package  com.bigbear.lexer;< Span class= "Hljs-keyword" >public  class  num  extends  token  {  public  final     int  value; public  num  (int  v) {//TODO auto-generated constructor stub  super         (Tag.num);    this . Value=v; }}

package com.bigbear.lexer;publicclass Word extends Token {    publicfinal String lexeme;    publicWord(int t,String s) {        super(t);        this.lexeme=new String(s);    }}

 PackageCom.bigbear.lexer;ImportJava.io.IOException;Importjava.util.Hashtable;ImportCom.bigbear.main.Code;/** * @author Winney Lexical Analyzer * */ Public  class Lexer {     Public intline =1;Private CharPeek ="';@SuppressWarnings("Rawtypes")PrivateHashtable words =NewHashtable ();@SuppressWarnings({"Unchecked"})Private void  Reserve(Word t)    {Words.put (t.lexeme, T); } Public Lexer() {Reserve (NewWord (Tag.true,"true")); ReserveNewWord (Tag.false,"false")); }@SuppressWarnings("Unchecked") PublicTokenScan()throwsIOException { for(;; peek = (Char) Code.read ()) {if(Peek = ="'|| Peek = =' \ t ')Continue;Else if(Peek = =' \ n ') line++;Else                 Break; }if(Character.isdigit (Peek)) {intv =0; do {v =Ten* v + character.digit (Peek,Ten); Peek = (Char) Code.read (); } while(Character.isdigit (Peek));return NewNum (v); }if(Character.isletter (Peek)) {StringBuffer b =NewStringBuffer ();                Do {b.append (peek); Peek = (Char) Code.read (); } while(Character.isletterordigit (Peek));            String s = b.tostring (); Word w = (word) words.get (s);if(W! =NULL)returnW W =NewWord (tag.id, s); Words.put (S, W);returnW } Token t =NewToken (Peek); Peek ="';returnT } Public Static void Main(string[] args) {Try{Lexer LX =NewLexer (); for(inti =0; I <5;            i++) {System.out.println (Lx.scan (). tag); }        }Catch(IOException e) {//TODO auto-generated catch blockE.printstacktrace (); }    }}

My first compiler parser

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More