How things work: SQL SELECT statement

Source: Internet
Author: User

The original website cannot be accessed, So Google snapshots are retained.
How things work:
SQL
 
Select
 
Statement
Introduction
:

Ever asked your self how things work inside
SQL
 
Select
Statement? In this article we won't be talking about how to writeSQL
 
Select
 
Statement but rather we will be talking about the algorithms and the methodology behind
Select
 
Statement and how
SQL
 
Decides which
Algorithm
 
It will use to filter out the results and return our expected results.

Selecting
Algorithm
:

In fact you can't do so, it is up to
SQL
 
Optimizer implementation to determine the selected
Algorithm
 
That best match the query you are going to invoke in order to enhance the query performance or in other words optimize it, so you don't have control over selecting
Algorithm
 
Although some
SQL
 
Optimizer implementations tried to enable the DB admin to specify which selection
Algorithm
 
Is suitable Based on the admin knowledge (for example the admin might know that binary search-we will mentioned that latter-might be the best choice ).

The preparation:

We need to get prepared first and get familiar with the terminologies that we will be using through the article

Before we go further we need to know the types of indexes:

• Primary Index-allows records to be read in an order that corresponds to the physical order in the file.
• Secondary index-any index that is not a primary index.

When we make an index we create something like a database for indexing doesn't it only include the key being indexed and the location counter which holds the location on the record in the database itself (the one that contains the data ).
Access path: an access path is
Algorithm
 
Used by database to satisfy the requirements
SQL
 
Statements.

The selection criteria:

As all of us know
SQL
 
Select
 
Operation is used to filter andSelect
 
Results Based on criteria.

For a simple selection there are two methods to base the selection on:

-Table scan (aka file scan): the scan scans all the records of the table. for large tables, this can take a long time. but for very small tables, a table scan can be actually be faster than index seek and we get the records that satisfy the search criteria in a fast manner.

-Index scan: as the name implies all the rows in the leaf level of the index are scanned this means that all of the rows of the table or the index are examined instead of the table directly (this involve search which wocould use an index ).

Optimizer in action:

When
SQL
 
Is first time executed, the optimizer must first determine whether an index exists. you can query any column of any table and an index is not required to do so. so, the optimizer must be able to access non-indexed data; it does this using a scan. in most cases the preference is to use an index as an index greatly optimizes data retrieval. however, an index cannot be used if one does not exist. and certain types
SQL
 
Statements simply are best satisfied with a complete scan of the data as shown below."
Select
 
* From employee"

Below is how the optimizer bases its demo-on whether to use and index retrieval or no.

-Search
Algorithm
:

-Linear search
:
Considered the easiest and most straightforward search through the database as it retrieves every record and tests it against the criteria to see if it satisfies it or no.

The linear search can either retrieve the data itself (as inSelect
 
Operation) or the data location (as in the update operation)
"
Select
 
Emp_name from employee "-binary
 
Search
: A more efficient search
Algorithm
 
Than the linear search but only applies if selection condition contains compatibility ity Comparison on a key attribute (in which the table is ordered based on it ).
"
Select
 
Emp_name from employee where SSN = '20170101 '"

-Index search:
-Using Primary Index
: When the condition involves compression ity Comparison on a key attribute with primary index (SSN in this case), then the primary index is used to retrieve at most one record.

"
Select
 
Emp_name from employee where SSN = '20170101 '"

-Using primary index (retrieve multiple Records)
: When the condition involves a comparison condition (>,<<=, >=) on key field with primary index (dep_number in this case ). this Returns records that satisfy the condition and the results will be ordered as they are in the file.
"
Select
 
Dep_name from department where dep_number> 5 "-using clustering index
:
If the selection condition involves condition ity Comparison on a non-key attribute which happens to have a clustered index, then the index will be used to retrieve all records satisfying the condition
"
Select
 
Emp_name from department where dep_number = 5"

-Secondary index (B tree)
: A secondary index is automatically used during searching if it improves the efficiency of the search. secondary indexes are maintained by the system and are invisible to the user, when the condition involves compression ity Comparison on a key field then it retrieves a single record and it returns multiple records if the indexing field is not a key. it can also be used with (>,>=, <or <+ ). complex selections:

Complex selection contains 2 types:

Conjunction
: When a statement is made up from simple conditions (as shown abve) connected with and."
Select
 
Dep_name from department where dep_size> 50 and dep_revenue> 1000

"Disjunction
: When a statement is made up from simple conditions (as shown abve) connected with or.

"
Select
 
Dep_name from department where dep_size> 50 or dep_revenue> 1000"

-Conjunctive selection using one (individual) Index
: If there is an access path for an attribute in one of the simple conditions in the conjunctive statement. if yes, abve algorithms can be used. after retrieving the Records test if each record satisfies the remaining simple conditions in the conjunctive condition or no (all the above access path will be tested before t for the linear search ).

-Conjunctive selection using composite index (multiple attributes)
: If two or more attributes are involved in duplicate ity in the conjunctive condition and a composite index exists on a combined filed we can use the index directly in other words we will search index directly if selection specifies equality condition on 2 or more attributes and a composite index exists so in this case access path that depends on the index will be taken into consideration (linear and binary search wont be used) for example: here we created an index on the composite key (customer_id, product_id)
"
Select
 
* From customerproduct where customer_id = 15 and product_id = 250"

-Conjunction selection by intersection of identifiers
: Requires indexes with record pointers (Record Pointer is and identifier for a record and provides the address of this record on the disk ). if secondary indexes or other access path are available on more than one of the fields involved in simple conditions in the conjunctive condition and if the indexes include record pointers, scan each index for pointers that satisfy an individual condition and then take intersection of All retrieved pointers to get set of pointers that satisfy the conjunctive condition.

-Disjunction Selection
: Compared to a conjunctive selection a disjunctive selection is harder to process and optimize, in such situation the optimization that can be done is not that much as the records satisfy the disjunction condition are the union of the records satisfying the individual conditions.

If any one of the conditions doesn't have access path then we will be forced to use linear search and if we have access path for every single condition then we can apply optimization by retrieving the records satisfying every condition and then Union the outcome to eliminate duplicates"
Select
 
* From employee where employee_gender = 'M' or employee_salary> 10000 or employee_department = 5"

The execution:

When a single condition specifies the selection (Simple selection) We can check if an access path exists on the attribute involved in that condition, if yes then this access path will be used for the execution otherwise the linear search will be used.
The query optimization won't be able to do much in the case we have only single simple condition on the opposite it will do a great job when there are conjunctive queries this is whenever more than one of the attributes involved in
Select
 
Condition have access path, here the query optimizer takes action and choose the access path that returns the fewest records the in most efficient manner by estimating the different costs and choosing the method with the least estimated cost.

Conclusion:

In this article we have seen how
SQL
 
Optimizer optimizesSelect
 
Statement in order to achieve the best performance, optimizer uses index to satisfy the selection condition if the following criteria were met:

• At least one of
SQL
 
Selection condition must be Indexable. Certain conditions are not Indexable by their nature and therefore the optimizer will never be able to use an index to satisfy them.

• One of the columns (in any Indexable
Select
 
Condition) must exist as a column in an available index.

And the optimizer wocould use a linear search if the above criteria weren't met.
After that the optimizer will go one step further and try to determine which access path (if exists) wocould return the fewest records in the most efficient manner and that's by doing an estimate of the costs for each access paths. next time we will be talking about algorithms behind this estimate so stay tuned


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.