Source: http://www.cnblogs.com/JeffreyZhao/archive/2009/10/12/code-for-fun-tokenizer.html
Today, I tried it again by mistake, including the following two answers:
Http://www.cnblogs.com/jeffreyzhao/archive/2009/10/21/code-for-fun-tokenizer-answer-1.html
Http://www.cnblogs.com/JeffreyZhao/archive/2009/10/22/code-for-fun-tokenizer-answer-2-fsharp.html>
I also want to implement the following by myself. It took a few hours to find that the answer to the original article is quite different, and the idea may be different in some places.
Similarly, I resolved the url rule to five states:
1. normal status
2. Single quotes
3. Single quotes in normal status
4. Single quotes
5. normal status encounters Separators
// Normal status
Var StateParser_1 = function (ch ){
Log (ch + "normal state ");
If (ch = "-"){
Return StateParser_5;
}
If (ch = "'"){
Return StateParser_3;
}
Else {
Token. push (ch );
Return StateParser_1;
}
};
// Single quotes
Var StateParser_2 = function (ch ){
Log (ch + "single quotes ");
If (ch = "'"){
Return StateParser_4;
} Else {
Token. push (ch );
Return StateParser_2;
}
};
// Single quotes in normal status
Var StateParser_3 = function (ch ){
Log (ch + "single quotes in normal state ");
Token. push (ch );
If (ch = "'"){
Return StateParser_1;
}
Return StateParser_2;
};
// Single quotes
Var StateParser_4 = function (ch ){
Log (ch + "single quotes in single quotes ");
If (ch = "'"){
Token. push (ch );
Return StateParser_2;
}
If (ch = "-"){
Return StateParser_5;
}
};
// Delimiter encountered in normal status
Var StateParser_5 = function (ch ){
Log (ch + "normal state separator status ");
TokenG. push (token );
Token = [];
If (ch = "-"){
Text. push (tokenG );
TokenG = [];
Return StateParser_1;
}
Elseif (ch = "'"){
Return StateParser_3;
}
Else {
Token. push (ch );
Return StateParser_1;
}
}
StateParser_1 and StateParser_2 can be considered as long states and are generally used to record node values. The other three are short States, also known as change points. After these changes, the token and tokenGroup are enabled and closed.