This question and answer is excerpted from I know Http://www.zhihu.com/people/chaos-xie http://www.zhihu.com/question/29922657
Thanks to the answer of the Netizen! The questions and possible answers are now recorded as follows:
Is there a language that is not suitable for using Flex/lex as a lexical parser? O ' REILLY Flex and Bison (Chinese version), question 5, 24th. Ask for advice! I personally think that lexical analysis, since it is to divide the input stream into a meaningful mark (see Long Shu), and as long as the programming language must be able to divide into meaningful notation (otherwise humans will not understand), then you can use the Lexical analyzer generator such as Flex generation, But since the book has raised the question that must be justified! Hope pointing!
ZeteThe answer: Javascript
The two semantics of Javascript regular expression literals and division operators, which are difficult to solve with Lex, typically use only lex for very few things, and then defer the real meaning of the discern to the parse phase.
There is a plan to use Lex's judgment/meaning, primarily by preserving the previous token state to make judgments, and you feel the complexity of it:
'//' always recognize as line comment start
If the symbol is '/' or '/= ', and the previous token is one of the following tokens, it is the division operator:
]Identifier Number RegularExpression Stringclass false null private protected public super this trueget include set
If the previous token is one of the following tokens, it is the beginning of the regular expression:
! != !== # % %= & && &&= &= ( * *= + += , - -= ->. .. ... / /= : :: ; < << <<= <= = == === > >= >> >>= >>> >>>=? @ [ ^ ^= ^^ ^^= { | |= || ||= ~abstract break case catch const continue debugger default delete do else enumexport extends final finally for function goto if implements import in instanceofinterface is namespace native new package return static switch synchronizedthrow throws transient try typeof use var volatile while with
But it still doesn't judge the previous token)} ...
if (true) /a/g ---> 正则表达式(x+y)/2 ---> 除法{}/a/g ---> 正则表达式+{}/a/g ---> 除法
So you have to add a state stack to determine whether the closing parenthesis is the if/for/while of the structure or the parentheses of the expression.
Add a status stack to determine if the closing brace is the end of block or object literal.
But it still doesn't judge the previous token to be + + or--the case
a++/a/g ---> 除法RegExp.prototype.foo = 3++/a/g.foo ---> 正则表达式
So we have to decide whether the preceding + + is the suffix operator or the prefix operator ...
At this point your lexer is filled with a bunch of very complex states ... You'll think about the value of life, wondering what Lex really means, why not just use a scannerless parser to solve this perverted language?
See JavaScript 2.0 Syntax Rationale
VczhThe answer:
ETA's answer to the question is relatively simple, as long as you use Bison as Lex can easily solve, all of those states are embed in your grammar. The main criterion is that when you need an expression and what you see at the time, if he is not a comment, it must be a regular expression.
The real complexity of the problem is bison, for example, C + + needs semantic analysis and grammar analysis at the same time, let the results of semantic analysis to guide the grammar analysis in the end to choose which grammar rule to resolve conflict.
Is there a language that is not suitable for using Flex/lex as a lexical analyzer?