Java-written compilers and JVMs
Why the structure of Java,java is most easily understood, and its rich design pattern makes the compiler structure very distinct
A compiler's front-end model
Source Code – Lexical Analyzer-(lexical unit)-parser-(parsing tree)-Intermediate code generator – three address codes
Plus a symbol table to connect all the structures
Grammar definitions
A context-independent method consists of four elements
1. A set of finalization symbols, the "lexical unit" Terminator, is a set of basic symbols for the language defined by the grammar.
2. A non-terminating symbol set, "syntax variable" each non-terminating symbol represents a collection of end symbol strings
3. A production set, where each of the production consists of a non-terminating symbol called the left or left side of the production, an arrow, and a sequence of terminating and non-terminating symbols called the resulting or right part.
4. Specify a start symbol for a non-terminating symbol
A lexical unit consists of two parts: a name and a property value.
Syntax Analysis tree
1. The label of the root node is the starting symbol of the grammar
2. The label of each leaf node is a terminating symbol or ε
3. The label of each internal node is a non-terminating symbol
4. If the non-finalization symbol A is a symbol of an internal node, and its child nodes are labeled from left to right respectively x1,x2,x3: Then there must be a production-type a→x1,x2. Xn, of which x1,x2,.. Xn can be either a terminating symbol or a non-terminating symbol, as a special case, if a→ε is a production, then a node labeled a can have only one sub-node labeled ε.
Then the basic concept is finished, directly on the code, because the code is more intuitive
Lexical analysis
First say a code class, why have it, I was to better test. At the same time, you can also directly assign the code to it, as long as you can do one of the characters can be read, students can find another way.
publicclass Code { publicstatic String content="int a=9;"; privatestaticint index=0; publicstaticcharread(){ return content.charAt(index++); }}
Reject Blanks and annotations
for (;; peek = (char) Code.read()) { if‘ ‘‘\t‘) continue; elseif‘\n‘) line++; else break; }
Pre-read
For example, we need to pre-read a character after 1 to differentiate between 1 and 10,t pre-read one to differentiate T and true, certainly like * this is not required for pre-read
Constant
if (Character.isDigit(peek)) { int0; do { 1010); peek = (char) Code.read(); while (Character.isDigit(peek)); returnnew Num(v); }
We're going to go into sequence when we enter 1+2+3.
<num,1><+><num,2><+><num,3>
+ is the end symbol, no attribute, so its tuple is <+>
Identify keywords and identifiers
For do if
If you enter
A=a+b
The end symbol is id=id+id.
<id,"a"><+><id,"a"><+><id,"b">
The keyword is not used for identifiers, so we want to identify the keywords and identifiers
This uses a symbol table, we use a hashtable to store the keyword
privatenew Hashtable(); @SuppressWarnings"unchecked" }) privatevoidreserve(Word t) { words.put(t.lexeme, t); } publicLexer() { reserve(new"true")); reserve(new"false")); }
if (Character.isLetter(peek)) { new StringBuffer(); do { b.append(peek); peek = (char) Code.read(); while (Character.isLetterOrDigit(peek)); String s = b.toString(); Word w = (Word) words.get(s); ifnull) return w; new Word(Tag.ID, s); words.put(s, w); return w; }
Lexical analyzer
package com.bigbear.lexer;< Span class= "Hljs-keyword" >public class tag { public final< /span> static int num= 256 , Id=257 , True= 258 , False=259 ;}
package com.bigbear.lexer;< Span class= "Hljs-keyword" >public class token { public Final int tag; public token (int tag) {this . Tag = tag; }}
package com.bigbear.lexer;< Span class= "Hljs-keyword" >public class num extends token { public final int value; public num (int v) {//TODO auto-generated constructor stub super (Tag.num); this . Value=v; }}
package com.bigbear.lexer;publicclass Word extends Token { publicfinal String lexeme; publicWord(int t,String s) { super(t); this.lexeme=new String(s); }}
PackageCom.bigbear.lexer;ImportJava.io.IOException;Importjava.util.Hashtable;ImportCom.bigbear.main.Code;/** * @author Winney Lexical Analyzer * */ Public class Lexer { Public intline =1;Private CharPeek ="';@SuppressWarnings("Rawtypes")PrivateHashtable words =NewHashtable ();@SuppressWarnings({"Unchecked"})Private void Reserve(Word t) {Words.put (t.lexeme, T); } Public Lexer() {Reserve (NewWord (Tag.true,"true")); ReserveNewWord (Tag.false,"false")); }@SuppressWarnings("Unchecked") PublicTokenScan()throwsIOException { for(;; peek = (Char) Code.read ()) {if(Peek = ="'|| Peek = =' \ t ')Continue;Else if(Peek = =' \ n ') line++;Else Break; }if(Character.isdigit (Peek)) {intv =0; do {v =Ten* v + character.digit (Peek,Ten); Peek = (Char) Code.read (); } while(Character.isdigit (Peek));return NewNum (v); }if(Character.isletter (Peek)) {StringBuffer b =NewStringBuffer (); Do {b.append (peek); Peek = (Char) Code.read (); } while(Character.isletterordigit (Peek)); String s = b.tostring (); Word w = (word) words.get (s);if(W! =NULL)returnW W =NewWord (tag.id, s); Words.put (S, W);returnW } Token t =NewToken (Peek); Peek ="';returnT } Public Static void Main(string[] args) {Try{Lexer LX =NewLexer (); for(inti =0; I <5; i++) {System.out.println (Lx.scan (). tag); } }Catch(IOException e) {//TODO auto-generated catch blockE.printstacktrace (); } }}
My first compiler parser