Relational Systems
Purpose of this chapter:
Query processing steps for RDBMS
The concept of query optimization
Basic methods and techniques
Query Optimization Classification:
Algebraic optimization
Physical optimization
RDBMS Query processing phase:
1. 查询分析2. 查询检查3. 4. 查询执行
Scanning, lexical analysis and parsing of query statements
Identify the language symbol from the query statement
Perform grammar checking and parsing
Query check
Semantic checking of legitimate query statements based on data dictionary
Checks access to users based on user rights and integrity constraints defined in the data dictionary
To convert SQL query statements into equivalent relational algebraic expressions after checking through
RDBMS typically uses a query tree (parse tree) to represent extended relational algebraic expressions
Converts the external name of a database object to an internal representation
Query optimization
Query optimization: Select an efficient execution of query processing policy
Query Optimization Classification:
Algebraic optimization: The optimization of relational algebraic expressions
Physical optimization: The selection of access paths and underlying operational algorithms
Query optimization method selection based on:
Rules-based (rule based)
Based on costs (cost based)
Query execution
Generate a query plan based on the execution strategy obtained by the optimizer
Code generator generates code that executes a query plan
Select the implementation of the operation
[例1]Select * from student where <条件表达式> ; 考虑<条件表达式>的几种情况: C1:无条件; C2:Sno=‘200215121‘; C3:Sage>20; C4:Sdept=‘CS‘ AND Sage>20;
The typical implementation method of the selection operation:
1. Simple full-table scanning method
The basic table sequential scan of the query, one by one checks whether each tuple satisfies the selection criteria, the tuple satisfies the condition as the result output
Suitable for small tables, not suitable for large tables
2. Index (or hash) scan method
An index on an attribute in the selection criteria (e.g. B + Tree index or hash index)
The index first finds the tuple master or tuple pointer that satisfies the condition, and then finds the tuple directly in the base table of the query through the tuple pointer.
[example 1 -C2 ] Take C2 as an example, sno= ' 200215121 ', and Sno has an index ( Or Sno is a hash code) use an index (or hash) to get a pointer to Sno for ' 200215121 ' tuple to retrieve the student from the student table by a tuple pointer [example 1 -c3 ] C3 as an example Sage> Span class= "Hljs-number" >20 , and Sage has b+ tree index using B+ The Tree Index finds the index entry for Sage=20 , which is the entry point for sage on the sequence set of B+ Tree > All tuple pointers 20 retrieve all students older than student " through these tuple pointers to the Hljs-number table.
[例1-C4] AND Sage>20,如果Sdept和Sage上都有索引:算法一:分别用上面两种方法分别找到Sdept=‘CS’的一组元组指针和Sage>20的另一组元组指针求这2组指针的交集到student表中检索得到计算机系年龄大于20的学生算法二:找到Sdept=‘CS’的一组元组指针,通过这些元组指针到student表中检索对得到的元组检查另一些选择条件(如Sage>20)是否满足把满足条件的元组作为结果输出。
Implementation of the connection operation
Connection operations are one of the most time-consuming operations in query processing
This section only discusses the most common implementation algorithms for equivalent connections (or natural joins)
[例2] SELECT * FROM Student,SC WHERE Student.Sno=SC.Sno;
Nested loop method (nested loop)
Each tuple (s) of the outer layer Loop (Student), retrieving each tuple (SC) in the Inner Loop (SC)
Check that the two tuples are equal on the connection properties (SNO)
If the join condition is met, the string is then output as a result until the tuple in the Outer loop table is processed
Sort-Merge method (Sort-merge join or Merge Join)
The tables that fit the connection are already in order.
Steps to sort-merge connection methods:
If the connected table is not ordered, first sort the student table and SC table by connection Property Sno
Take the first SNO in the student table, and sequentially scan a tuple with the same SNO in the SC table
Index Join method
Steps:
① the index of the attribute SNO on the SC table, if the index is not originally
② for each tuple in student, the corresponding SC tuple is found by the SNO value through the SC index
③ Connect these SC tuples with the student tuples.
Loop execution ②③ until tuples in the student table are processed
Hash Join method
The connection attribute is used as hash code, and the tuples in R and S are hashed into the same hash file with the same hash function.
Steps:
Partitioning phase (partitioning phase):
One-time processing of tables with fewer tuples (e.g. R)
Scatter its tuple into the bucket of the hash table by hash function.
Heuristic stage (probing phase): Also known as the join phase (join phase)
Take another table (S) once
Hash S's tuple into the appropriate hash bucket
Connect tuples to all tuples in the bucket that come from R and match them.
Steps to sort-merge connection Methods (cont.):
When scanning to the first SC tuple that is not the same SNO, the student table scans its next tuple and then scans the SC table for tuples with the same Sno to connect them.
Repeat the above steps until the student table is scanned
Student and SC tables are scanned once.
If 2 tables are unordered, the execution time is added to the sort time of the two tables
For 2 large tables, the total time will still be significantly reduced after sorting and using the Sort-merge join method to perform a connection
The above hash join algorithm assumes that the smaller tables in the two tables can be completely placed in the memory hash bucket after the first stage
Query processing of database-relational database system