650) this.width=650; "Src=" http://images2017.cnblogs.com/blog/139239/201709/139239-20170905092130538-711765515. JPG "width=" 407 "height=" 226 "style=" border:0px; "/>
In the previous article, we learned how SQL queries are executed and what you need to be aware of when writing SQL query statements.
Below, I further study the query method and query optimization.
query based on collections and program methods
The implicit fact in the inverse model is that there is a difference between a collection-and-program-based approach to establishing a query.
The programmatic approach to querying is a very similar approach to programming: you tell the system what to do and how to do it. For example, in the previous article, you can query the database by executing a function and then call another function, or use a logical way that includes loops, conditions, and user-defined functions (UDFs) to get the results of the final query. You'll find that in this way, you've been requesting a subset of the data in a layer. This approach is also often referred to as progressive or progressive querying.
The other is a collection-based approach that specifies only the actions that need to be performed. The thing to do with this approach is to specify the conditions and requirements for the results you want to get through the query. In retrieving data, you do not need to focus on the internal mechanism of implementing the query: The database engine determines the best algorithm and logic for executing the query.
Because SQL is collection-based, this approach is more efficient than program methods, which explains why in some cases SQL can work faster than code.
The collection-based query method is also a skill required by the data mining analysis industry to be mastered! Because you need to be skilled in switching between the two methods. If you find that a program query exists in your query, you should consider whether you need to override this section.
from query to execution plan
The reverse mode is not static. Avoiding querying the reverse model and rewriting the query can be a difficult task in your process of becoming a SQL developer. So it is often necessary to use tools to optimize your queries in a more structured way.
Thinking about performance requires more structured approaches and more in-depth approaches.
However, this structured and in-depth approach is based primarily on query planning. The query plan is first parsed into a "parse tree" and defines exactly what algorithm is used for each operation and how the process is coordinated.
Query Optimization
When you refine a query, you will most likely need to manually check the plan that the optimizer generates. In this case, you will need to analyze your query again by looking at the query plan.
To master such a query plan, you need to use some of the tools that the database management system provides to you. You can use some of the following tools:
Note that if you are using PostgreSQL, you can distinguish between different EXPLAIN, and you only need to get a description of how planner executes the query without running the plan. At the same time EXPLAIN ANALYZE will execute the query and return to you an analysis report of the evaluation query plan and the actual query plan. In general, the actual implementation plan will actually implement the plan, and the evaluation execution plan can solve the problem without executing the query. Logically, the actual execution plan is more useful because it contains other details and statistics that actually occur when the query is executed.
Next you'll learn more about Xplain and ANALYZE, and how to use these two commands to learn more about your query plan and query performance. To do this, you need to start using two tables: One_million and half_million to do some examples.
You can use EXPLAIN to retrieve the current information for the One_million table: Make sure it is placed at the top of the run query and return to the query plan after the run is complete:
Explainselect *
From One_million;
QUERY plan_________________________________________________
Seq Scan on One_million
(cost=0.00..18584.82 rows=1025082 width=36)
(1 row)
In the example above, we see that the cost of the query is 0.00. 18584.82, the number of rows is 1025082, the column width is 36.
You can also use ANALYZE to update statistics.
ANALYZE one_million;
Explainselect *
From One_million;
QUERY plan_________________________________________________
Seq Scan on One_million
(cost=0.00..18334.00 rows=1000000 width=37)
(1 row)
In addition to EXPLAIN and ANALYZE, you can also use EXPLAIN ANALYZE to retrieve the actual execution time:
EXPLAIN Analyzeselect *
From One_million;
QUERY plan___________________________________________________
Seq Scan on One_million
(cost=0.00..18334.00 rows=1000000 width=37)
(Actual time=0.015..1207.019 rows=1000000 Loops=1)
Total runtime:2320.146 ms
(2 rows)
The disadvantage of using EXPLAIN ANALYZE is that you need to actually execute the query, which is worth noting!
So far, all the algorithms we've seen are sequential or full-table scans: This is a way to scan a database, and each row of the scanned table is read sequentially (serially), and each column checks to see if the condition is met. In terms of performance, sequential scanning is not the best execution plan because the entire table needs to be scanned. But if you use slow disks, sequential reads will be quick.
There are also some examples of other algorithms:
EXPLAIN Analyzeselect *
From One_million JOIN Half_millionon
(One_million.counter=half_million.counter);
QUERY PLAN
_____________________________________________________________
Hash Join (cost=15417.00..68831.00 rows=500000 width=42)
(Actual time=1241.471..5912.553 rows=500000 Loops=1)
Hash Cond: (one_million.counter = half_million.counter)
Seq Scan on One_million
(cost=0.00..18334.00 rows=1000000 width=37)
(Actual time=0.007..1254.027 rows=1000000 Loops=1)
-Hash (cost=7213.00..7213.00 rows=500000 width=5)
(Actual time=1241.251..1241.251 rows=500000 Loops=1)
buckets:4096 batches:16 Memory usage:770kb
Seq Scan on Half_million
(cost=0.00..7213.00 rows=500000 width=5)
(Actual time=0.008..601.128 rows=500000 Loops=1)
Total runtime:6468.337 ms
We can see that the query optimizer has chosen a Hash Join. Remember this operation, as we need to use this to evaluate the time complexity of the query. We have noticed that there is no half_million.counter index in the example above, and we can add an index in the following example:
create INDEX on Half_million (counter);
EXPLAIN Analyzeselect *
from one_million JOIN half_millionon
(one_million.counter=half_million.counter);
QUERY PLAN
______________________________________________________________
Merge Join (cost=4.12..37650.65 rows=500000 width=42)
(actual time=0.033..3272.940 rows=500000 Loops=1)
Merge Cond: (One_million.counter = half_ Million.counter)
-> Index Scan using One_million_counter_idx on one_million
(cost= 0.00..32129.34 rows=1000000 width=37)
(actual time=0.011..694.466 rows=500001 Loops=1)
-> Index Scan using Half_million_counter_idx on half_million
(cost= 0.00..14120.29 rows=500000 width=5)
(actual time=0.010..683.674 rows=500000 Loops=1)
Total runtime:3833.310 ms
(5 rows)
By creating an index, the query optimizer has determined how to find the Merge join when the index scan is made.
Note the difference between an index scan and a full table scan (sequential scan): The latter (also known as a "table scan") is to scan all the data or index all the pages to find the appropriate results, while the former only scans each row in the table.
The second part of the tutorial is described here. There will be a final article on how to write a better SQL query series, so stay tuned.
Original link: http://www.kdnuggets.com/2017/08/write-better-sql-queries-definitive-guide-part-2.html
reprint Please specify from: Grape City Controls
This article is from the "Grape City Control Technology Team Blog" blog, be sure to keep this source http://powertoolsteam.blog.51cto.com/2369428/1962767
How to write a better SQL query: The ultimate guide-Part II