In the previous article, an NFA equivalent to a regular expression was obtained, and this article explains how to convert from NFA to DFA and to simplify the DFA and character classes.
I. The representation of the DFA
The representation of the DFA is similar to the NFA, but it is much simpler and requires only one method of adding a new state. The code for the Dfa class looks like this:
Namespace Cyjb.Compilers.Lexers {
class Dfa:ilist<dfastate> {
//Create a new state in the current DFA.
dfastate newstate () {}}}
The state of the DFA is also relatively simple, with only two required attributes: symbolic indexing and state transitions.
The symbolic index indicates which regular expression corresponds to the current accepted state. However, a state of the DFA may correspond to multiple states of the NFA (see the subset constructor below), so the symbolic index of the DFA state is an array. For normal states, the symbolic index is an empty array.
A state transition represents how to move from the current state to the next state, because the character class has been partitioned when the NFA is constructed, the transfer in the DFA that corresponds to the different character classes is used directly from the array (there is no transfer in the DFA, and there is only one transfer for each character class).
There is also a state type attribute in the NFA state definition, but this attribute is not in the DFA state because the state of the trailing type is processed when the DFA matches the string (as explained in the next article), and the state of the Trailinghead type is constructed at the time of the DFA The state of the Normal type is merged (see section 2.4).
The following is the definition of the Dfastate class:
Namespace Cyjb.Compilers.Lexers {
class Dfastate {
//Get the DFA containing the current state.
DFA DFA {get; private set;}
Gets or sets the index of the current state.
int Index {get; set;}
Gets or sets the symbolic index of the current state.
int[] Symbolindex {get; set;}
Gets or sets the state to which a particular character class is transferred.
dfastate This[int Charclass] {get; set;}}}
The two additional attributes defined in the DFA's state are also used by the DFA and Index to facilitate state use.
Second, NFA conversion to DFA
2.1 Subset Construction method
The NFA is converted to a DFA, and a subset construction (subset construction) algorithm is used. The process of the algorithm is similar to the NFA matching process mentioned in section 3.1 of the C # lexical Analyzer (iii) regular expression. In the NFA matching process, use is a state of the NFA set, then the subset of the constructor is a state of the DFA corresponding to a state set of NFA, that is, the DFA read into the input string A1a2an after the arrival of the state, corresponds to the NFA read the same string a1a2an The set of States to arrive after.
The operations to be used for subset construction algorithms are: