How to write better SQL queries: Ultimate Guide-Part 2: Ultimate SQL query

Source: Internet
Author: User

How to write better SQL queries: Ultimate Guide-Part 2: Ultimate SQL query

In the previous article, we learned how to execute SQL queries and what to note when writing SQL query statements.

Next, I will learn more about query methods and query optimization.

 

Queries based on collection and Program Methods

The implicit fact in the reverse model is that there is a difference between the set and program methods when the query is established.

  • The query program method is very similar to the programming method: You tell the system what to do and how to do it. For example, in the previous article, execute a function and call another function to query the database, or use a logical method that contains loops, conditions, and user-defined functions (udfs) to obtain the final query result. In this way, you will find that you are always requesting a subset of data at the layer. This method is often called step-by-step or row-by-row query.
  • The other method is set-based. You only need to specify the operation to be executed. To use this method, specify the conditions and requirements for the results you want to obtain through the query. During data retrieval, you do not need to pay attention to the internal mechanism for implementing the query: the database engine determines the best algorithm and logic for executing the query.

Because SQL is set-based, this method is more effective than the program method, which also explains why SQL can work faster than code in some cases.

The collection-based query method is also required by the data mining and analysis industry! Because you need to switch between the two methods skillfully. If you find that your query contains a program query, you should consider whether to rewrite this part.

 

From query to execution plan

The reverse mode is not static. When you become an SQL developer, it may be difficult to avoid querying reverse models and rewrite queries. Therefore, you often need to use tools to optimize your queries in a more structured way.

To think about performance, we need not only a more structured approach, but also a more in-depth approach.

However, this structured and in-depth method is mainly based on the query plan. The query plan is first parsed as a "Resolution Tree" and accurately defines the algorithm used for each operation and how to coordinate the operation process.

 

Query Optimization

When optimizing the query, you may need to manually check the scheduler generated by the optimizer. In this case, you need to view the query plan to analyze your query again.

To master such a query plan, you need some tools provided to you by the database management system. You can use the following tools:

  • Some Software Package function tools can generate a graphical representation of the query plan.
  • Other tools provide you with text descriptions of the query plan.

Note:If you are using PostgreSQL, You can differentiate the explanations. You only need to get the description to EXPLAIN how the planner executes the query without running the plan. At the same time, explain analyze executes the query and returns an analysis report to evaluate the query plan and the actual query plan. In general, the actual execution plan will actually execute this plan, and the evaluation execution plan can solve this problem without executing the query. Logically, the actual execution plan is more useful because it contains other details and statistics that actually occur during query execution.

Next, you will learn more about XPLAIN and ANALYZE, and how to use these two commands to further understand your query plan and query performance. To do this, you need to start using two tables: one_million and half_million for some examples.

You can use EXPLAIN to retrieve the current information of the one_million table: Make sure it is placed at the top of the running query. After running, it is returned to the query plan:

EXPLAINSELECT *FROM one_million;QUERY PLAN_________________________________________________Seq Scan on one_million(cost=0.00..18584.82 rows=1025082 width=36)(1 row)

In the above example, we can see that the Cost queried is 0. 00 .. 18584.82, the number of rows is 1025082, and the column width is 36.

You can also use ANALYZE to update statistics.

ANALYZE one_million;EXPLAINSELECT *FROM one_million;QUERY PLAN
_________________________________________________Seq Scan on one_million(cost=0.00..18334.00 rows=1000000 width=37)(1 row)

In addition to EXPLAIN and ANALYZE, you can also use explain analyze to retrieve the actual execution time:

EXPLAIN ANALYZESELECT *FROM one_million;QUERY PLAN___________________________________________________Seq Scan on one_million(cost=0.00..18334.00 rows=1000000 width=37)(actual time=0.015..1207.019 rows=1000000 loops=1)Total runtime: 2320.146 ms(2 rows)

The disadvantage of using explain analyze is that you need to actually execute the query, which is worth noting!

All algorithms we have seen so far are sequential scans or full table scans: This is a method to scan rows in the database, and each row of the scanned table is sequential) in the order of reading, each column will check whether the conditions are met. In terms of performance, sequential scanning is not the best execution plan because the entire table needs to be scanned. However, if a slow disk is used, sequential reading will also be fast.

Examples of other algorithms:

EXPLAIN ANALYZESELECT *FROM one_million JOIN half_millionON (one_million.counter=half_million.counter);QUERY PLAN_____________________________________________________________Hash Join (cost=15417.00..68831.00 rows=500000 width=42)(actual time=1241.471..5912.553 rows=500000 loops=1)Hash Cond: (one_million.counter = half_million.counter)    -> Seq Scan on one_million    (cost=0.00..18334.00 rows=1000000 width=37)    (actual time=0.007..1254.027 rows=1000000 loops=1)    -> Hash (cost=7213.00..7213.00 rows=500000 width=5)    (actual time=1241.251..1241.251 rows=500000 loops=1)    Buckets: 4096 Batches: 16 Memory Usage: 770kB    -> Seq Scan on half_million    (cost=0.00..7213.00 rows=500000 width=5)(actual time=0.008..601.128 rows=500000 loops=1)Total runtime: 6468.337 ms

We can see that the query optimizer selects Hash Join. Remember this operation because we need to use this operation to evaluate the time complexity of the query. We noticed that the index half_million.counter is not available in the above example. We can add an index in the following example:

CREATE INDEX ON half_million(counter);EXPLAIN ANALYZESELECT *FROM one_million JOIN half_millionON (one_million.counter=half_million.counter);QUERY PLAN______________________________________________________________Merge Join (cost=4.12..37650.65 rows=500000 width=42)(actual time=0.033..3272.940 rows=500000 loops=1)Merge Cond: (one_million.counter = half_million.counter)    -> Index Scan using one_million_counter_idx on one_million    (cost=0.00..32129.34 rows=1000000 width=37)    (actual time=0.011..694.466 rows=500001 loops=1)    -> Index Scan using half_million_counter_idx on half_million    (cost=0.00..14120.29 rows=500000 width=5)(actual time=0.010..683.674 rows=500000 loops=1)Total runtime: 3833.310 ms(5 rows)

By creating indexes, the query optimizer determines how to find Merge join during index scanning.

Note:The difference between index scan and full table scan (Sequential Scan): the latter (also known as "Table scan") finds the appropriate results by scanning all data or indexing all pages, while the former only scans each row in the table.

 

The second part of the tutorial will be introduced here. The last article in the "how to write better SQL queries" series will be coming soon.

Link: http://www.kdnuggets.com/2017/08/write-better-sql-queries-definitive-guide-part-2.html

For more information, see grape city control.

 

Related reading:

[Report benefits] More than 100 Report Templates can be downloaded for free.

How to write better SQL queries: Ultimate Guide-Part 1

A SQL statement for dynamic hierarchical Query

Migrate SQL Server database to Azure SQL

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.