Read simpledb SQL statement parsing again (1)

Source: Internet
Author: User
Tags createindex lexer

The previous sections basically implement data read and write. For database users, Structured Query Language is the window for accessing the database. This section describes how simpledb parses SQL statements.

First of all, simpledb implements a subset of the SQL-92, the specific supported syntax is as follows:

 
1:

 
2:<Query >:=Select<Selectlist>From<Tablelist> [Where<Predicate>]

 
3:<Selectlist >:= <field> [, <selectlist>]

 
4:<Tablelist >:= tablename [, <tablelist>]

5:<Predicate >:= <term> [And<Predicate>]

 
6:<Term >:= <expression >=< expression>

 
7:<Expression >:= <field >|< constant>

 
8:<Field>: = fieldname

 
9:<Constant >:= string |Integer

 
10:

11:<Modify>:= <Insert >|<Delete>|<Update>

 
12:<Insert >:= insertIntoTablename (<fieldlist>)Values(<Constlist>)

 
13:<Fieldlist >:= <field> [, <fieldlist>]

 
14:<Constlist >:= <constant> [, <constant>]

15:<Delete>:=Delete FromTablename [Where<Predicate>]

 
16:<Update>:=UpdateTablenameSet<Field >=< expression> [Where<Predicate>]

 
17:

18:<Create>:= <Createtable >|< createview >|< createindex>

 
19:<Createtable >:=Create TableTablename (<fielddefs>)

 
20:<Fielddefs >:= <fielddef> [, <fielddefs>]

 
21:<Fielddef >:= fieldname <typedef>

22:<Typedef >:=Int|Varchar(Integer)

 
23:<Createview >:=Create ViewViewnameAs<Query>

 
24:<Createindex >:=Create IndexIndexnameOnTablename (<field>)

The parse module in simpledb completes the parsing function. The class diagram is as follows:

Figure 1 parse class diagram

At the beginning, first think about SQL statement parsing. The main tasks are as follows:

By dividing input SQL statement strings,

First, the input SQL statement string segmentation and other work;

Then, identify the action in the crud statement and the table associated with the database operation and related data;

Finally, the identified tables and data are passed to the related interfaces of the system to complete the query.

Combined with the above class chart, you can divide it into three types: lexer and parser tools to complete SQL parsing; * data, the carrier of SQL parsing results; badsyntaxexception, and syntax error exceptions.

Next, let's start with the tool class:

> Lexer

In terms of compilation principles, this is a lexical analyzer mainly responsible for cutting the input SQL string,

Step 1To split the SQL string using the specified delimiter and save the result in a string array named tokens.

Figure 2 lexer split SQL

As shown in section 2, use a regular expression to separate SQL strings.

 
\ S{1} | (,) {1} | (=) {1} | (\ () {1} | (\) {1} | ('[^']*') {1}

Figure 3 re used in the SQL to tokens Splitting Process

Figure 4 tokens results during actual running

 

Figure 5 malicious input results

(During the interview, I asked a question. If a user just entered N spaces and cut off the field-list of the SELECT statement, what should I do?

For example

 
SelectSid, snameFromStudentsWhereSID ='123'

In fact, in the preceding step, all units are separated. As shown in figure 5, each space occupies a tokens position. However, lexer has a nexttoken method to effectively filter out these disturbances:

 
Private VoidNexttoken ()

 
{

Position ++;

 
While(Position <tokens. Length & tokens [position]. Length = 0)

 
Position ++;

 
}

 

After, Lexer uses the followingAlgorithmTo implement various stringsFragmentProcessing

(Note: lexer has the tokens array pointer position, pointing to the elements in the tokens array currently processed by lexer)

1. Locate the next processing element token and use the nexttoken () method.
2. Match the token and use the match * () method.

3. process the token and use the eat * () method.

In the match phase, the following RE is used to match string fragments:

Private Const StringIntpattern =@ "^ [-+]? (0 | [1-9] \ D *) $";

 
Private Const StringStringpattern =@ "^ '[^'] * '$";

 
Private Const StringIdpattern =@ "^ [A-Za-Z _] \ W * $"

 
Private Const StringKeywordpattern =@ "^ [A-Za-Z] * $";

Simpledb supports the following SQL keywords:

 
String[] KEYWORDS = {"Select","From","Where","And","Insert","","Values",

 
"Delete","Update","Set","Create","Table","Varchar","View","","Index","On"};

In the Eat stage,

There are five eat methods in total: eatdelim (char) eatid () eatintconstant () eatkeyword (string) eatstringconstant, where eatdelim and eatkeyword have parameter input, according to the parameter does not understand, the corresponding separator or keyword is eaten. Others only use position to parse the token pointed by position to the specified Int or string.

It can be said that lexer completes the first step of SQL parsing.

 

You have to get up early tomorrow. Come here first.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.