Python implementation of simple Database (ii) Single table query and top N implementation

Source: Internet
Author: User
Tags min regular expression stmt table name

In the previous article, we introduced our storage and indexing process, which will introduce SQL queries, single table queries, and TOPN implementations.

One, SQL parsing

Formal SQL parsing is using a parser, but I looked for a long time, only know can use YACC, bison and so on, sqlite use of lemon, pounded the whole day did not implement, the use of the regular expression of Python.

1, delete useless space, jump Geff, line break, etc.

We use a semicolon '; ' As a terminator of an SQL statement, before entering a semicolon, we string the SQL statement entered into a string, deleting some useless characters from the entire SQL statement,

1  def rmnousechar (SQL):
2 while Sql.find ("'")!=-1: #将引号删除, no matter what type is treated as a character type
3 sql = sql.replace ("'", "")
4 while sql.find (' ")!=-1:
5 sql = Sql.replace (' "', ')
6 while sql.find (' \ t ')!=-1: #删除制表符
7 sql = Sql.replace ("\ T", "")
8 while sql.find (' \ n ')!=-1: #删除换行符
9 sql = sql.replace ("\ n", "")
statements = Sql.split ("") #分割成列表, remove extra space after stitching into string
One while "" in statements:
statements.remove ("")
sql= ""
stmt in statements:
SQL + + stmt+ ""
Return Sql[0:-1] #最后一个空格删掉

2, Keywords capital;

Scan the keyword in the SQL statement to capitalize the keyword. Here we use a technique that adds a space to each SELECT statement preceded by a space before and after each keyword, so that you can replace the part of the word, and if you don't add a space, like charity will be replaced with charrity, which is not what we want.

Onekeywords = ["Select", "from", "WHERE",
    "DESC", "ASC",
    "DATE", "Day", "INT", "CHAR", "VARCHAR", "DE Cimal ","
    SUM "," AVG "," MAX "," MIN "," COUNT "," as "," Top "," and "," OR "]
twokeywords = [" GROUP by "," order By "]

3, parsing and formatting the Select Child statement;

A common SELECT statement typically contains a select, from, where, group by, order by five part (regardless of nested query), where, group by, and order by can not appear, but if it appears, The order in the SQL statement must satisfy the Select, from, where, and group by, and so we define:

Stmttag = [' select ', ' from ', ' WHERE ', ' GROUP by ', ' Order by ', '; '] #select CHILD Statement Flag Word

To find the glyphs for each of the child statements, and to parse the child statements according to the glyphs, here we define a method to find the next sign word:

def nextstmttag (Sql,currenttag): #根据当前标志词找下一个标志词
    index = sql.find (currenttag,0) for
    tag in Stmttag:
        if Sql.find (Tag,index+len (Currenttag))!=-1: Return
            tag

For example, our test found that there is a where in the SQL statement, it must have a WHERE clause, we get the next keyword through the Nextstmttag () method, if there is a group by in SQL, the next one is group by, if there is no group By and by, the next glyph is Order BY, otherwise the next glyph is a semicolon ";" because there must be a terminator semicolon in one SQL.

4, the combination of metadata table to check syntax errors;

After parsing the SQL's child statements, we can do a simple syntax check, in conjunction with the metadata check if the table in the WHERE clause exists in the database, and if the property in the other child statement is in the table of the WHERE clause, in the process of checking, capitalize the property and add the table name, which is in the same format as: [Table name]. [Property name], while optimizing the Where condition of a multiple-table query, where the single table query condition is placed in front of the list, and the multiple table joins are behind. Please see the following example:

We enter the following SQL statement:

Select L_orderkey,o_orderdate,o_shippriority,
min (l_orderkey) as Min_odkey,
Max (o_shippriority) as Max_ Priority from
Customer,orders,lineitem
where
c_mktsegment = "machinery" and
C_custkey = O_custkey and
L_orderkey = O_orderkey and
o_orderdate < "1995-05-20" and
l_shipdate > "1995-05-18"
Group by l_orderkey,o_orderdate,o_shippriority ORDER by
O_orderdate Desc,o_orderdate;

Results of parsing (checked by syntax):

 {' from ': [' CUSTOMER ', ' ORDERS ', ' LINEITEM '], ' GROUP ': [' LINEITEM '. L_orderkey ', ' ORDERS. O_orderdate ', ' ORDERS. O_shippriority '], ' order ': [[' ORDERS]. O_orderdate ', ' DESC '], [' ORDERS. O_orderdate ', ' ASC ']], ' SELECT ': [[' LINEITEM. L_orderkey ', none, none], [' ORDERS. O_orderdate ', none, none], [' ORDERS. O_shippriority ', none, none], [' LINEITEM. L_orderkey ', ' MIN ', ' Min_odkey '], [ORDERS. O_shippriority ', ' MAX ', ' max_priority '], ' where ': [[' CUSTOMER]. C_mktsegment ', ' = ', ' machinery '], [ORDERS. O_orderdate ', ' < ', ' 1995-05-20 '], [' LINEITEM. L_shipdate ', ' > ', ' 1995-05-18 ', [' CUSTOMER ']. C_custkey ', ' = ', ' ORDERS. O_custkey '], [' LINEITEM. L_orderkey ', ' = ', ' ORDERS. O_orderkey ']]} 

You can see that we parse the entire SQL into a dictionary, the dictionary key is a child statement marker, the value is a formatted child statement, and the parse result is similar to the JSON format. The group by and from child statements are simply table names and property names, so you use a list representation, and other child statements are more complex, and we represent each part of its child statements in list, such as order sub statements, not only attributes but also ascending or descending descriptions While the Select also has aggregate functions and renames; WHERE clause we only consider conditions that are greater than, equal to, or less than, that is, each where condition can be represented by a ternary group. The part that does not appear is padded with none.

This is our general process of parsing Select SQL, the details are no longer introduced, because this parsing method is not clever, not on the table, the regular army are used syntax and parser, we fight guerrilla warfare.

See more highlights of this column: http://www.bianceng.cnhttp://www.bianceng.cn/Programming/extra/

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.