Mixer: SQL lexical analyzer Design

Source: Internet
Author: User
Tags lexer
Your current location: Home tutorial Programming Development mssql database mixer: SQL lexical analyzer design thanks to 3lian8 delivery time: Source: the triple tutorial introduces how mixer hopes to provide custom routing and SQL blacklist on the proxy layer to prevent SQL injection attacks.

Your current location: Home> tutorial> programming> mssql database> mixer: SQL lexical analyzer design thanks to 3lian8 delivery time: Source: the triple tutorial introduces how mixer hopes to provide custom routing and SQL blacklist on the proxy layer to prevent SQL injection attacks.

Your current location: Home> tutorial> programming> mssql database> mixer: SQL lexical analyzer Design

Mixer: SQL lexical analyzer Design

Thanks for the 3lian8 delivery time: Source: Sanlian tutorial

Introduction

Mixer wants to provide custom routing and SQL blacklist on the proxy layer to prevent SQL injection attacks. The cornerstone of these functions is to parse the SQL statements sent by users. That is, my largest lexical analysis and syntax analysis.

So far, I have implemented a simple lexical analyzer to break SQL statements into multiple tokens. For the syntax analysis from token and SQL AST construction, I have no experience at the moment (the compilation principle is too bad), and I am in urgent need of help.

So here is a brief introduction to the lexical analysis of mixer.

Tokenize

In many places, we need to perform lexical analysis. There are usually several methods:

Use a powerful tool, such as lex and mysql-proxy.

Use Regular Expressions

State machine

For tools, I think there is a bad thing about learning costs. For example, when I use lex, I need to learn its syntax and the code generated by tools is not very readable, A large amount of code may be slow. Therefore, mysql itself implements a lexical analysis module.

For regular expressions, performance issues may be very important, and the complexity is not lower than using tools like lex.

The state machine may be a good method for implementing lexical parsing by myself. For SQL lexical parsing, I think it is not very difficult to write it by myself using the state machine method, therefore, mixer implements one.

State machine

Generally, the implementation of a state machine uses state + action + switch, which may be as follows:

?

1

2

3

4

5

6

7

8

Switch state {

Case state1:

State = action1 ()

Case state2:

State = action2 ()

Case state3:

State = action3 ()

}

For a state, we know which action it will process through the switch, and for each action, we know what the next state is after the execution is complete.

For the above implementation, too many state statements may lead to too many case statements. We can simplify it through the state function.

A state function is to execute the current state action and directly return the next state function.

We can do this:

?

1

2

3

4

5

Type stateFn func (* Lexer) stateFn

For state: = startState; state! = Nil {

State = state (lexer)

}

Therefore, we need to implement every state function and its next state function to be executed.

Mixer lexer

For more information about mixer lexical analysis, see. It is mainly implemented in the parser module.

For a lexer, you need to provide the NextToken function for external users to obtain the next token for subsequent operations (such as syntax analysis ).

The next token of lexer is as follows:

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Func (l * Lexer) NextToken () (Token, error ){

For {

Select {

Case t: = <-l. tokens:

Return t, nil

Default:

If l. state = nil {

Return Token {TK_EOF, ""}, l. err

}

L. state = l. state (l)

If l. err! = Nil {

Return Token {TK_UNKNOWN, ""}, l. err

}

}

}

}

Tokens is a channel. the token parsed by state will be emit to this channel for NextToken to obtain. If the channel is empty, the state function will be called again.

It can be seen that it is easy to use go to implement lexical parsing. The rest is to write the corresponding state function for SQL parsing.

Todo

There are still many imperfections in the lexical analysis of mixer. For example, the analysis of scientific notation numeric values is not complete. For more information, see the mysql official lexical analysis module.

Related Articles

  • Previous Article: pl/sqldeveloper row where the cursor is executed
  • Next article: PERCENTILE_CONT and PERCENTILE_DISC in SQLSERVER
  • [Back to triple homepage] [back to mssql database]/[join triple collection]

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.