Overview
Lexical analysis is the first step in the compilation phase. The task at this stage is to read the source program from left to right, one character at a time, to scan the character stream that makes up the source program and then recognize the word (also called the word symbol or symbol) according to the word-building rules. The lexical Analysis program implements this task. Lexical analysis programs can be generated automatically using tools such as Lex.
This project implements a simple C language lexical analyzer. is now hosted on the [email protected] website, home: http://git.oschina.net/kinegratii/Lexer
Project Features
Support for decimal numbers, octal numbers, identifiers, keywords, operators, separators, and many other morphemes
Support file import and code writing two kinds of input methods
The algorithm and UI implement a low-coupling between the two through a specific interface
Project structure
lexer-- com.kinegratii.lexer Main package |--- Analyzer.java parser and its callback interface |--- Lexer.java Project Startup class |--- MainFrame.java Interface class |--- SoftwareInfo.java software information constant definition -- com.kinegratii.token Word |--- DoubleToken.java floating point |--- DotToken.java Separators |--- IdentifiterToken.java markers |--- IntegerToken.java integer numbers |--- ReservedToken.java keywords-- com.kinegatii.utils toolkit |--- BareBonesBrowserLaunch.java invoke browser
Project
Lexical unit sequences
Symbol table
Project development
This project is a practical project of compiling the course, the preliminary code completed in April 2012, also experienced a few minor changes, the version to the 1.2.4 (then the version is more casual). The v1.3.0 version of this was the first time the overall refactoring was made.
v1.3.0 2014-09-24
Refactoring the entire project, partitioning packages and classes according to responsibilities, and implementing low-coupling algorithms and UIs
Removed the Language Transformation section
The so-called language transformation part of this parser is some of the custom rules, such as the continuous underline identifier should only keep one, such as "a__b" = "a_b" and so on. These are not part of the standard analyzer, the reason why there is such a thing, is to prevent full-text replication, the annual language transformation is not the same, even if the previous code to change it, it will be familiar with the entire project code.
Follow-up plan
Support for configuring the analyzer to implement a custom Language transformation section, primarily at the code level. The basic requirement is a configurable, universal interface.
The current processing callback interface is also relatively simple, you can consider exposing some of the parser's data in the interface.
C Language Lexical analyzer