The previous sections basically implement data read and write. For database users, Structured Query Language is the window for accessing the database. This section describes how simpledb parses SQL statements.
First of all, simpledb implements a subset of the SQL-92, the specific supported syntax is as follows:
1:
2:<Query >:=Select<Selectlist>From<Tablelist> [Where<Predicate>]
3:<Selectlist >:= <field> [, <selectlist>]
4:<Tablelist >:= tablename [, <tablelist>]
5:<Predicate >:= <term> [And<Predicate>]
6:<Term >:= <expression >=< expression>
7:<Expression >:= <field >|< constant>
8:<Field>: = fieldname
9:<Constant >:= string |Integer
10:
11:<Modify>:= <Insert >|<Delete>|<Update>
12:<Insert >:= insertIntoTablename (<fieldlist>)Values(<Constlist>)
13:<Fieldlist >:= <field> [, <fieldlist>]
14:<Constlist >:= <constant> [, <constant>]
15:<Delete>:=Delete FromTablename [Where<Predicate>]
16:<Update>:=UpdateTablenameSet<Field >=< expression> [Where<Predicate>]
17:
18:<Create>:= <Createtable >|< createview >|< createindex>
19:<Createtable >:=Create TableTablename (<fielddefs>)
20:<Fielddefs >:= <fielddef> [, <fielddefs>]
21:<Fielddef >:= fieldname <typedef>
22:<Typedef >:=Int|Varchar(Integer)
23:<Createview >:=Create ViewViewnameAs<Query>
24:<Createindex >:=Create IndexIndexnameOnTablename (<field>)
The parse module in simpledb completes the parsing function. The class diagram is as follows:
Figure 1 parse class diagram
At the beginning, first think about SQL statement parsing. The main tasks are as follows:
By dividing input SQL statement strings,
First, the input SQL statement string segmentation and other work;
Then, identify the action in the crud statement and the table associated with the database operation and related data;
Finally, the identified tables and data are passed to the related interfaces of the system to complete the query.
Combined with the above class chart, you can divide it into three types: lexer and parser tools to complete SQL parsing; * data, the carrier of SQL parsing results; badsyntaxexception, and syntax error exceptions.
Next, let's start with the tool class:
> Lexer
In terms of compilation principles, this is a lexical analyzer mainly responsible for cutting the input SQL string,
Step 1To split the SQL string using the specified delimiter and save the result in a string array named tokens.
Figure 2 lexer split SQL
As shown in section 2, use a regular expression to separate SQL strings.
\ S{1} | (,) {1} | (=) {1} | (\ () {1} | (\) {1} | ('[^']*') {1}
Figure 3 re used in the SQL to tokens Splitting Process
Figure 4 tokens results during actual running
Figure 5 malicious input results
(During the interview, I asked a question. If a user just entered N spaces and cut off the field-list of the SELECT statement, what should I do?
For example
SelectSid, snameFromStudentsWhereSID ='123'
In fact, in the preceding step, all units are separated. As shown in figure 5, each space occupies a tokens position. However, lexer has a nexttoken method to effectively filter out these disturbances:
Private VoidNexttoken ()
{
Position ++;
While(Position <tokens. Length & tokens [position]. Length = 0)
Position ++;
}
After, Lexer uses the followingAlgorithmTo implement various stringsFragmentProcessing
(Note: lexer has the tokens array pointer position, pointing to the elements in the tokens array currently processed by lexer)
1. Locate the next processing element token and use the nexttoken () method.
2. Match the token and use the match * () method.
3. process the token and use the eat * () method.
In the match phase, the following RE is used to match string fragments:
Private Const StringIntpattern =@ "^ [-+]? (0 | [1-9] \ D *) $";
Private Const StringStringpattern =@ "^ '[^'] * '$";
Private Const StringIdpattern =@ "^ [A-Za-Z _] \ W * $"
Private Const StringKeywordpattern =@ "^ [A-Za-Z] * $";
Simpledb supports the following SQL keywords:
String[] KEYWORDS = {"Select","From","Where","And","Insert","","Values",
"Delete","Update","Set","Create","Table","Varchar","View","","Index","On"};
In the Eat stage,
There are five eat methods in total: eatdelim (char) eatid () eatintconstant () eatkeyword (string) eatstringconstant, where eatdelim and eatkeyword have parameter input, according to the parameter does not understand, the corresponding separator or keyword is eaten. Others only use position to parse the token pointed by position to the specified Int or string.
It can be said that lexer completes the first step of SQL parsing.
You have to get up early tomorrow. Come here first.