650) this.width=650; "Src=" http://images2017.cnblogs.com/blog/139239/201709/13923920170911112105313661504674. JPG "width=" 423 "height=" 235 "style=" border:0px; "/>
This time we learned the last article of how to write a better SQL query series.
Time complexity and large o symbols
Through the first two articles, we have a certain understanding of the query plan. Next, we can also use computational complexity theory to further explore and think about the performance improvement. The field of theoretical computer science focuses on the classification of computational problems based on difficulty. These computational problems can be algorithmic problems or query problems.
For queries, we can classify them not by difficulty, but by the time it takes to run the query and get the results. This approach is also known as the classification of time complexity.
With the large O symbol, the run time can be expressed according to the input growth rate, because the input can be arbitrarily large. The large o symbol does not include coefficients and low order items so that you can focus on the important part of the query run time: growth rate. In this way, the coefficients and the lower order are discarded, and the time complexity is gradually described, which means that the input becomes infinitely large.
In a database language, complexity measures the length of time a query runs.
Note that the size of the database increases not only as the data stored in the table increases, but also the size of the database in the indexes in the database.
Estimating the time complexity of a query plan
The execution plan defines the algorithm used for each operation, which also allows the execution time of each query to be logically represented as a function of the size of the data table in the query plan. In other words, you can use the large O symbol and execution plan to estimate the complexity and performance of the query.
In the following summary, we will look at four types of time complexity concepts.
With these examples, you can see that the time complexity of a query differs depending on what query is running.
For different databases, you need to consider different indexing methods, different execution plans, and different implementations.
The time complexity concepts listed below are therefore very common.
O (1): Constant time
There is a query algorithm that takes the same amount of time to execute regardless of the size of the input, which is a constant time query. These types of queries are not common, and here's an example:
SELECT TOP 1 T.*from t
The time complexity of this algorithm is a constant, because just select any row from the table. Therefore, the length of time is independent of the size of the table.
Linear time: O (n)
If the time execution of an algorithm is proportional to the input size, then the execution time of the algorithm increases as the input size increases. For databases, this means that query execution times are proportional to the size of the table: as the number of rows of data in the table increases, the query time increases accordingly.
An example would be to query with a WHERE clause on a nonindexed column: This requires a full table scan or sequential scan, which results in an O (n) time complexity. This means that each row in the table needs to be read to find the data for the correct ID. Even if the first row finds the correct data, the query will read each row of data.
If there is no index, then the complexity of this query is O (n) i_id:
SELECT I_idfrom Item;
This also means a count query such as COUNT (*) from table, with an O (n) time complexity, and a full table scan unless the total number of rows in the data table is stored. At this point, the complexity will be more like O (1).
Closely related to linear execution time is the sum of time for all linear execution plans. Here are some examples:

Hash connection (hash Join) is an O (M + N) complex. The classic hash join algorithm for two internal data table connections is to prepare a hash table for smaller data tables first. The entry of a hash table consists of the connection properties and rows. Access to the hash table is implemented by applying the hash function to the Join property. Once the hash table is built, the larger table is scanned and the hash table is looked at to find the related rows in the smaller tables. The complexity of

Merge joins (merge join) is O (M + N), but the connection is heavily dependent on the index on the connection column and, without an index, the rows are sorted based on the key used in the connection:

If the two tables are sorted based on the key used in the connection, the query has a complexity of O (M + N).

If all two tables have indexes on the join column, the indexes maintain the columns sequentially, and do not need to be sorted. The complexity is O (M + N) at this time.

If none of the two tables have indexes on the join column, you need to sort the two tables first, so the complexity will be O (M log m + N log n).

If one table has an index on the join column and the other does not, you need to sort the table without the index, so the complexity will be O (M + N log n).

For nested joins, the complexity is usually O (MN). This connection is particularly effective when one or two tables are very small (for example, less than 10 records).
Remember: A nested connection is a way to compare each record in one table with each record in another table.
Logarithmic time: O (log (n))
If the execution time of the algorithm is proportional to the logarithm of the input size, then the algorithm is called the logarithmic time algorithm; For queries, this means that the execution time is proportional to the logarithm of the database size.
The query plan time complexity for performing an index scan or a clustered index scan is a logarithmic time. A clustered index is the index of the leaf level of the index that contains the actual data rows of the table. Aggregation is very similar to other indexes: it is defined on one or more columns. This also forms the index primary key. The clustered primary key is the primary key column of the clustered index. A clustered index scan is a basic operation that an RDBMS reads from one row to the other in a clustered index.
An i_id index exists in the following example, which also results in the complexity of O (log (n)):
SELECT i_stockfrom Itemwhere i_id = N;
If there is no index, the time complexity is O (n).
Secondary time: O (n ^ 2)
If the execution time of the algorithm is proportional to the square of the input size, the algorithm is called the logarithmic time algorithm. For a database, this means that the execution time of the query is proportional to the square of the database size.
An example of a query with two times complexity is as follows:
SELECT *from Item, Authorwhere item.i_a_id=author.a_id
The minimum complexity is O (n log (n)), but the maximum complexity is O (n ^ 2) based on the index information of the connection attribute.
is a graph that estimates query performance based on time complexity, which allows you to see the performance of each algorithm.
650) this.width=650; "Src=" http://images2017.cnblogs.com/blog/139239/201709/13923920170911113848469435165978. PNG "width=" 597 "height=" 429 "style=" border:0px; "/>
SQL tuning
You can measure query planning and time complexity in the following ways, and further tune SQL queries:
Replace the full table scan of unnecessary big data table with index scan;
Ensure that the table is in the optimal order of connections;
Ensure optimal use of the index;
Caches full table scans of small data tables.
"How to write Better SQL query" all the content of the tutorial is introduced here, I hope that through the introduction of this tutorial, can help you to write better, more excellent SQL query.
Original link: https://www.datacamp.com/community/tutorials/sqltutorialquery#importance
Reproduced please specify from: Grape City control
This article is from the "Grape City Control Technology Team Blog" blog, be sure to keep this source http://powertoolsteam.blog.51cto.com/2369428/1964301
How to write a better SQL query: The ultimate guidePart III