ML-Lex is a variant of Lex. It is a lexical analyzer generating program that uses the ml language.
I. ml-Lex format
One ml-Lex has the following format:
User declarations%%
ML-Lex Definitions%%
Rules
Each part is separated by %.
Rules are used to define the function of the lexical analyzer. Each rule is divided into two parts: Regular Expression and behavior. Regular expressions are used to match words. When a match is successful, the corresponding action is executed. All actions must return the same type. The type returned is defined in user declarations.
You can define the values used in rules in user declarations. At least two things must be defined in user declarations: lexresult type and EOF function. Lexresult is the returned type of the preceding rules behavior, and the EOF function is called when it reaches the end of the input file. Generally, it returns an EOF signal or throws an exception. The return type is also lexresult.
In the definitions section, you can define the named Regular Expression and start state. The regular expression in the rule can be matched only when the analyzer is in a specific starting state. You can redefine the actual state of the analyzer in a rule.
The statements in ml-Lex definitions and rules must be followed by semicolons (;).
Ii. Syntax Summary
1. User declarations)
This part defines the types, variables, and functions required by the user. Statements with % are not allowed.
2. Lex Declaration (Ml-Lex definitions)
The starting status can be defined as % s identifier list;
For example, define the starting state start: % s start;
The status identifier must start with a letter, followed by letters, numbers, underscores, and primes ...... Tutorial
The named regular expression can be: identifer = regular expression;
In this section, you can also define some functions or structures with special meanings, such:
% Reject can define the reject () function
3. Rules
Each rule has the same format:
<
Start State List>
Regular Expression => (
Code );
The START State List is optional. It is enclosed by <> and separated by commas (,). Each State must be declared in % S. When the Starting Status List is empty, any starting status will try to match with the regular expression. Otherwise, it will try to match only when the analyzer goes out of the list. The analyzer starts execution with the defined state initial. If there is a conflict in the match, the analyzer selects the longest match. If there are multiple identical matches, the analyzer selects the first match.
If the input does not match, the analyzer throws an exception named lexerror.
4. built-in variables and functions
Some built-in variables or functions are often used in Rule behavior.
Yypos: int, indicating the position of the current string header relative to the start of the file.
Yytext: string, which stores the value of the string that has been successfully matched recently.
YybeginState, Switch status. The status must be declared in % s in the Lex declaration part. The default status is initial.
(To be continued ...)
Reference: http://www.smlnj.org/doc/ML-Lex/manual.html