Your current location: Home tutorial Programming Development mssql database mixer: SQL lexical analyzer design thanks to 3lian8 delivery time: Source: the triple tutorial introduces how mixer hopes to provide custom routing and SQL blacklist on the proxy layer to prevent SQL injection attacks.
Your current location: Home> tutorial> programming> mssql database> mixer: SQL lexical analyzer design thanks to 3lian8 delivery time: Source: the triple tutorial introduces how mixer hopes to provide custom routing and SQL blacklist on the proxy layer to prevent SQL injection attacks.
Your current location: Home> tutorial> programming> mssql database> mixer: SQL lexical analyzer Design
Mixer: SQL lexical analyzer Design
Thanks for the 3lian8 delivery time: Source: Sanlian tutorial
Introduction
Mixer wants to provide custom routing and SQL blacklist on the proxy layer to prevent SQL injection attacks. The cornerstone of these functions is to parse the SQL statements sent by users. That is, my largest lexical analysis and syntax analysis.
So far, I have implemented a simple lexical analyzer to break SQL statements into multiple tokens. For the syntax analysis from token and SQL AST construction, I have no experience at the moment (the compilation principle is too bad), and I am in urgent need of help.
So here is a brief introduction to the lexical analysis of mixer.
Tokenize
In many places, we need to perform lexical analysis. There are usually several methods:
Use a powerful tool, such as lex and mysql-proxy.
Use Regular Expressions
State machine
For tools, I think there is a bad thing about learning costs. For example, when I use lex, I need to learn its syntax and the code generated by tools is not very readable, A large amount of code may be slow. Therefore, mysql itself implements a lexical analysis module.
For regular expressions, performance issues may be very important, and the complexity is not lower than using tools like lex.
The state machine may be a good method for implementing lexical parsing by myself. For SQL lexical parsing, I think it is not very difficult to write it by myself using the state machine method, therefore, mixer implements one.
State machine
Generally, the implementation of a state machine uses state + action + switch, which may be as follows:
?
1
2
3
4
5
6
7
8
Switch state {
Case state1:
State = action1 ()
Case state2:
State = action2 ()
Case state3:
State = action3 ()
}
For a state, we know which action it will process through the switch, and for each action, we know what the next state is after the execution is complete.
For the above implementation, too many state statements may lead to too many case statements. We can simplify it through the state function.
A state function is to execute the current state action and directly return the next state function.
We can do this:
?
1
2
3
4
5
Type stateFn func (* Lexer) stateFn
For state: = startState; state! = Nil {
State = state (lexer)
}
Therefore, we need to implement every state function and its next state function to be executed.
Mixer lexer
For more information about mixer lexical analysis, see. It is mainly implemented in the parser module.
For a lexer, you need to provide the NextToken function for external users to obtain the next token for subsequent operations (such as syntax analysis ).
The next token of lexer is as follows:
?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Func (l * Lexer) NextToken () (Token, error ){
For {
Select {
Case t: = <-l. tokens:
Return t, nil
Default:
If l. state = nil {
Return Token {TK_EOF, ""}, l. err
}
L. state = l. state (l)
If l. err! = Nil {
Return Token {TK_UNKNOWN, ""}, l. err
}
}
}
}
Tokens is a channel. the token parsed by state will be emit to this channel for NextToken to obtain. If the channel is empty, the state function will be called again.
It can be seen that it is easy to use go to implement lexical parsing. The rest is to write the corresponding state function for SQL parsing.
Todo
There are still many imperfections in the lexical analysis of mixer. For example, the analysis of scientific notation numeric values is not complete. For more information, see the mysql official lexical analysis module.
Related Articles
Previous Article: pl/sqldeveloper row where the cursor is executed
Next article: PERCENTILE_CONT and PERCENTILE_DISC in SQLSERVER
[Back to triple homepage] [back to mssql database]/[join triple collection]