Introduced
Mixer wants to provide custom routing, SQL blacklist to prevent SQL injection attacks in the proxy layer, and the cornerstone is to parse the SQL statements sent by the user. That's my first big lexical analysis and grammar analysis.
So far, I've just implemented a simpler lexical analyzer to decompose SQL statements into multiple token. And for the syntax analysis from the token, build the SQL AST, I now really have no experience (compiler principle is too bad), the need for cattle to help.
So, here is just a brief introduction to the lexical analysis of mixer.
Tokenize
In many places, we all need lexical analysis, usually in several ways:
Use a powerful tool, such as Lex,mysql-proxy, in this way
Using regular expressions
State machine
For the use of tools, I think there is a less good place is the cost of learning, for example, I use lex when I need to learn its syntax, while the code generated by the tool is not very readable, the code is large, more serious is likely to be slow. So MySQL itself is to implement a lexical analysis module.
For regular expressions, performance problems can be a very important consideration, and the complexity is no lower than using tools like Lex.
State machine Maybe I think I'm doing it. A very good way to implement lexical parsing, for SQL lexical parsing, I feel that the way to write by using state machine is not very difficult, so mixer himself realized one.
State machine
In general, the implementation of a state machine is based on the practice of the States + action + switch, possibly as follows:
?
1 2 3 4 5 6 7 8 |
Switch state {case state1:state = action1 () Case state2:state = Action2 () Case STATE3: State = Action3 ()} |
For a state, we know by switch which action it will be handled by, and for each action we know what the next state is after the execution completes.
For the above implementation, if there is too much state, it may cause too many case statements, which we can simplify with the state function.
A state function is the execution of the current state action and returns directly to the next state function.
We can do this:
?
1 2 3 4 5 |
Type statefn func (*lexer) statefn for state: = Startstate; State!= Nil {state = state (Lexer)} |
So what we need to accomplish is each state function and its next state function that needs to be executed.
Mixer Lexer
The lexical analysis of mixer is mainly referenced by this method. Mainly implemented in the parser module.
For a lexer, a nexttoken function is provided for the external acquisition of the next token for subsequent operations (such as parsing).
Lexer next token is as follows:
?
1 2 3 4 5 6 7 8 9 a |
Func (l *lexer) nexttoken () (Token, error) { for { & Nbsp;select { case T: = <-l.tokens: return T, Nil default: if L.state = = Nil { return Token{TK _eof, ""}, L.err } l.state = L.state (l) if L.err!= Nil { return Token{tk_unknown, ""}, L.err } } }} |
Tokens is a channel, each time the state resolves the token will emit to this channel for Nexttoken get, if the channel is empty, then call the state function again.
As you can see, it's easy to implement a lexical parsing with go, and the rest is to write the corresponding state function to parse the SQL.
Todo
Mixer Lexical analysis There are many imperfect places, such as the scientific count method of numerical analysis is not perfect, the follow-up preparation for reference to MySQL official lexical analysis module in a good improvement.