How to write a better SQL query: The ultimate guide-part I.

Source: Internet
Author: User

650) this.width=650; "Src=" http://images2017.cnblogs.com/blog/139239/201708/139239-20170829102801202-859313490. JPG "width=" 399 "height=" 257 "style=" border:0px; "/>

Structured Query Language (SQL) is an indispensable skill in the data mining and analysis industry, and it is generally easier to learn this skill. For SQL, writing a query statement is only the first step, and it is important to ensure that the query statement is efficient and appropriate for your database operations. This tutorial will provide you with some steps to evaluate your query statements.

  • First, you should understand the importance of learning SQL for the work of data mining analysis;

  • Next, you should first learn the processing and execution of SQL query statements so that you can better understand how important it is to write high-quality queries. Specifically, you should understand how queries are parsed, rewritten, optimized, and ultimately evaluated;

  • Once you have mastered the above, you need not only to revisit the query reverse model used by beginners When writing query statements, but also to learn about alternatives and solutions for possible errors. You should also learn more about collection-based program methods in query work.

  • There is also a need to focus on the reverse model in terms of performance, in addition to manually improving the methods of SQL queries, you need to analyze your queries in a more structured and in-depth manner so that you can use other tools to accomplish your entire query work.

  • Before you execute a query, you need a deeper understanding of the time complexity of executing the query plan.

  • Finally, you should understand how to further refine your query statements.

Why do you want to learn SQL?

looking for work in the data Mining analytics industry, SQL is one of the most needed skills, whether it's applying for data analysis, data engine work, data mining analysis, or other work. In the 2016 Data Science Practitioners ' remuneration report released by O ' Reilly, 70% of respondents confirmed this by saying they needed to use SQL in a professional environment. In addition, SQL is much better than programming languages such as R (57%) and Python (54%) in this survey. So in the field of data mining analysis, SQL is an essential skill.

Let's analyze the reasons why SQL was developed early in the 1970s and is still enduring today:

First, the company basically stores data in a relational database management system (RDBMS) or relational Data Flow Management system (RDSMS), so you need to use SQL to achieve access. SQL is a common data language that can be used to interact with almost any other database and even build your own database locally!

Second, only a small number of SQL implementations do not follow the standard, incompatible between vendors. Therefore, understanding the SQL standard is a necessary requirement in the data mining analysis industry.

Third, the most important thing is that SQL is also accepted by newer technologies, such as hive or spark SQL. Hive is a SQL-like query Language interface for querying and managing large datasets; Spark SQL can be used to execute SQL queries.

In short, here's why you should learn this query language:

    • Even for beginners, SQL is easy to learn. The learning curve is very flat, and it takes almost no time to write sq queries.

    • SQL follows the "learn once, anytime, anywhere" principle, so it's a bargain to spend time learning SQL!

    • SQL is an excellent complement to programming languages, and in some cases, writing queries is even more of a priority than writing code!

    • ...

SQL processing and query execution

To improve the performance of SQL queries, you first need to know what happens internally when you run the query.

The following is the process of query execution:

    • First, the query is parsed into a "parse tree"; Parses whether the query satisfies the syntactic and semantic requirements. The parser creates an internal representation of the input query, and then passes this output to the rewrite engine.

    • Then, the optimizer's task is to find the best execution or query plan for the given query. The execution plan accurately defines the algorithm used for each operation and how to reconcile the execution of the operation.

    • Finally, to find the best execution plan, the optimizer enumerates all possible execution plans and determines the quality or cost of each plan in order to obtain information about the current database state and finally select the best execution plan. Because the query optimizer may be imperfect, database users and administrators sometimes need to manually check and tune the optimizer-generated plans for better performance.

Now you know what a good execution plan is.

As previously understood, the cost quality of the plan plays an important role. More specifically, factors such as the amount of disk I/O required to evaluate the plan, the planned CPU costs, and the overall response time and total execution time of the database client are critical. This is the concept of time complexity. We will continue to understand later.

Next, the selected query plan is executed, evaluated by the execution engine of the system, and the results of the query are returned.

650) this.width=650; "Src=" http://images2017.cnblogs.com/blog/139239/201708/139239-20170829101007202-506336402. PNG "style=" border:0px; "/>

Writing SQL queries

It should be further explained that the garbage collection principle (GIGO) is originally expressed in query processing and execution: the person who formulates the query, also determines the performance of SQL query.

This means that in writing queries, some things can be done synchronously. As the article started, there are two criteria for writing queries: First, the queries you write need to meet certain criteria, and secondly, you should address performance issues that can occur in queries.

Overall, there are four clauses and keywords that make it easy for novices to consider performance issues:

    • WHERE clause

    • INNER join and LEFT JOIN keywords

    • HAVING clause

Although this approach is simple and naïve, these methods are a good guide for a beginner. These are the places where you start writing, where mistakes are easy to find.

At the same time, to improve performance and make it meaningful, you cannot detach from the context: When considering SQL performance, you cannot arbitrarily think that the above clauses and keywords are not good. Using where or having a clause can also be a good query statement.

Use the following sections to learn more about reverse models and workarounds when writing queries, and to use these tips and tricks as a guide. How you rewrite the query and whether you need to rewrite the query depends on the amount of data, as well as the database and the number of times required to execute the query. It all depends on your query goals, and it's important to have some knowledge about the data beforehand!

1. Retrieve only the data you need

When writing SQL queries, the more data is not the better. It is therefore prudent to use the SELECT statement, the DISTINCT clause, and the LIKE operator.

SELECT Statement

After writing the query statement, the first thing to do is to check whether the SELECT statement is concise. Your goal should be to delete unnecessary select columns. So that only the data that meets your query purpose is taken.

If there are subqueries related to using exists, then you should use constants in the SELECT statement instead of selecting the actual column values. This is especially handy when checking for entities.

Keep in mind that a correlated subquery is a subquery that uses values from an external query, and in this case null can be used as a "constant", which is really confusing!

Using the following example, you can see what it means to use constants:

Select Driverslicensenr, Namefrom driverswhere EXISTS (SELECT ' 1 ' from Fines WHERE fines.driverslicensenr = d RIVERS.DRIVERSLICENSENR);

Tip : It's easy to see that using related subqueries is not always a good idea, so consider avoiding the use of related subqueries in the following ways, such as using INNER join overrides:

SELECT Driverslicensenr, Namefrom driversinner JOIN fines on fines.driverslicensenr = DRIVERS.DRIVERSLICENSENR;

DISTINCT clauses

The SELECT DISTINCT statement is used to return different values. DISTINCT is a clause that can not be used as much as possible, because adding DISTINCT to a query statement can result in an increase in execution time.

like operator

When you use the LIKE operator in a query, the index is not used if the pattern starts with% or _. It will prevent the database from using the index, if one exists. Of course, from another point of view, you can also assume that this type of query might relax the condition and retrieve many records that do not necessarily meet the target of the query.

In addition, your knowledge of data stored in your data can help you develop a pattern that allows you to properly filter all your data to find the data you want most.

2. Narrowing down the query results

If you cannot avoid using SELECT statements, consider narrowing the query results by other means. For example, use the limit clause and the method of the data type conversion.

TOP , LIMIT and the ROWNUM clauses

You can add the limit or TOP clause to the query to set the maximum number of rows for the query results. Here is an example:

SELECT TOP 3 *from Drivers;

Please note that you can further specify percent.

For example, if you want to change the first line of the query, SELECT TOP PERCENT *.

SELECT Driverslicensenr, Namefrom driverslimit 2;

In addition, you can add the rownum clause, corresponding to the limit used in the query:

SELECT *from Driverswhere driverslicensenr = 123456 and ROWNUM <= 3;

Data type Conversions

The smallest data type should be used because a small data type is more efficient.

When data type conversion is required in the query, the execution time is increased, so the conversion of data type is avoided as much as possible.

If you cannot avoid it, you need to carefully define the conversion of the data type.

This article is the first of this series of tutorials, followed by more "How to write Better SQL query" article to share to everyone, please look forward to.

Original link: http://www.kdnuggets.com/2017/08/write-better-sql-queries-definitive-guide-part-1.html

Reproduced please specify from: Grape City control

This article is from the "Grape City Control Technology Team Blog" blog, be sure to keep this source http://powertoolsteam.blog.51cto.com/2369428/1960762

How to write a better SQL query: The ultimate guide-part I.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.