Design of Mixer:sql Lexical analyzer

Source: Internet
Author: User
Tags lexer sql injection

Introduced

Mixer wants to provide custom routing, SQL blacklist to prevent SQL injection attacks in the proxy layer, and the cornerstone is to parse the SQL statements sent by the user. That's my first big lexical analysis and grammar analysis.

So far, I've just implemented a simpler lexical analyzer to decompose SQL statements into multiple token. And for the syntax analysis from the token, build the SQL AST, I now really have no experience (compiler principle is too bad), the need for cattle to help.

So, here is just a brief introduction to the lexical analysis of mixer.

Tokenize

In many places, we all need lexical analysis, usually in several ways:

Use a powerful tool, such as Lex,mysql-proxy, in this way

Using regular expressions

State machine

For the use of tools, I think there is a less good place is the cost of learning, for example, I use lex when I need to learn its syntax, while the code generated by the tool is not very readable, the code is large, more serious is likely to be slow. So MySQL itself is to implement a lexical analysis module.

For regular expressions, performance problems can be a very important consideration, and the complexity is no lower than using tools like Lex.

State machine Maybe I think I'm doing it. A very good way to implement lexical parsing, for SQL lexical parsing, I feel that the way to write by using state machine is not very difficult, so mixer himself realized one.

State machine

In general, the implementation of a state machine is based on the practice of the States + action + switch, possibly as follows:

?

1 2 3 4 5 6 7 8 Switch state {case state1:state = action1 () Case state2:state = Action2 () Case STATE3: State = Action3 ()}

For a state, we know by switch which action it will be handled by, and for each action we know what the next state is after the execution completes.

For the above implementation, if there is too much state, it may cause too many case statements, which we can simplify with the state function.

A state function is the execution of the current state action and returns directly to the next state function.

We can do this:

?

1 2 3 4 5 Type statefn func (*lexer) statefn for state: = Startstate; State!= Nil {state = state (Lexer)}

So what we need to accomplish is each state function and its next state function that needs to be executed.

Mixer Lexer

The lexical analysis of mixer is mainly referenced by this method. Mainly implemented in the parser module.

For a lexer, a nexttoken function is provided for the external acquisition of the next token for subsequent operations (such as parsing).

Lexer next token is as follows:

?

1 2 3 4 5 6 7 8 9 a Func (l *lexer) nexttoken () (Token, error) {    for {       & Nbsp;select {            case T: = <-l.tokens:                 return T, Nil             default:                  if L.state = = Nil {                     return Token{TK _eof, ""}, L.err                  }                  l.state = L.state (l)                  if L.err!= Nil {                     return Token{tk_unknown, ""}, L.err                  }          }     }}

Tokens is a channel, each time the state resolves the token will emit to this channel for Nexttoken get, if the channel is empty, then call the state function again.

As you can see, it's easy to implement a lexical parsing with go, and the rest is to write the corresponding state function to parse the SQL.

Todo

Mixer Lexical analysis There are many imperfect places, such as the scientific count method of numerical analysis is not perfect, the follow-up preparation for reference to MySQL official lexical analysis module in a good improvement.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.