Format SQL to improve efficiency

Last Update:2015-04-27 Source: Internet

Author: User

Tags aliases

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article by Bole Online-cucr translation, Huang Li-min school draft. without permission, no reprint!
English Source: Msiman.ga. Welcome to join the translation team.

Background

Formatted SQL does not run faster than unformatted SQL. The database may not really care if you put commas in front of or behind each field name. To help you become more sensible and be an efficient SQL writer, I recommend that you follow some formatting guidelines. In this article, I'll share how to format SQL statements to improve productivity. I define productivity as a way to get accurate results from SQL, while the code is easy to understand, modify, and Debug. I'll just focus on the SELECT statement, which accounts for 99% of the SQL I write. Formatting SQL code is a very personal choice, and I know very well that different people will love their own formatting rules.

Examples of problems

This is a typical SQL usage scenario that requires such a report in business, with data in three tables, customer, sales, and location, respectively. In January 2015, the report needs to show the number of customers and total sales in each ZIP code area. This should be a simple SQL statement that connects three tables.

There may be a problem with the data

While SQL is easy, ensuring that results are accurate is a real challenge, and the following are one of many possible causes, including:

The data may come from different data sources. This means that referential integrity is not guaranteed in different tables. Simply put, you cannot assume that all postal codes on the Customer table are valid, and that the problem also exists on the location table.
Applications that access customer data may not have the appropriate data validation. An incorrect ZIP code may have been deposited.
The postcode table may not have all zip codes. The new postal code code may be introduced, but has not been added to the table since the last update.

First principle

For me, formatted SQL is more likely to get the correct results from SQL because it has clear SQL and is easy to track. The first thing I do write a statement that gets the total number of customers. This is a number, and I'll compare it after I've written the entire statement.

The first statement I wrote was:

1234	`SELECT` `COUNT(DISTINCTcust_id)` `AScount_customersFROM` `customers`

Result:

Count_customers
"10″

This query is important because it follows the first principle (external link). Because there is no SQL connection, so there is no dependency, I know this is the correct number of customers. I always write down the results, because I always need to compare this number in this article is 10.

The next thing I want to do is add the necessary fields and tables to this query. I emphasize adding the word because, based on the formatting rules I follow, I can comment out the elements of the query to get the same results as when I applied the first principle. Here is my final formatted query, using the way you format the query.

Formatting sql

Here is my recommended formatted SQL, which is followed by the justification for the formatting choices I made.

1234567891011121314151617 SELECT 0 ,c.cust_post_code ,p.location ,COUNT(DISTINCT c.cust_id) number_customers ,SUM(s.total_amount) AS total_salesFROM customers c JOIN post_codes p ON c.cust_post_code = p.post_code JOIN sales s ON c.cust_id = s.cust_idWHERE 1=1 AND s.sales_date BETWEEN ‘2015-01-01‘ AND ‘2015-01-31‘ --AND s.order_id = 5GROUP BY c.cust_post_code ,p.location

Always use table aliases

This will be confirmed in your SQL. If you do not use aliases (external links) for each field that participates in the query, sometimes later, fields with the same name are added to a table used in the query. Your query and your report will appear with an error (duplicate field names are found).

Comma in front of field

When I debug/test my query, it makes it easy for me to comment and uncomment the fields without having to modify any other rows in the query to make sure the comma is in the correct place. I've read some articles that bloggers have to change part of another query to punches to make sure the comma is correct, but it's a big problem if you spend most of your time writing and testing SQL statements. You will be more efficient in this way. This is workplace very well in the Select and GROUP by queries section.

I use SELECT 0 in the development environment and tend to remove it before entering the production environment. It allows me to put commas in front of all the fields. If there is no 0, I would like to comment out C.cust_post_code, which is the first field, I must comment out the comma before the second field. I also do the same thing in the GROUP by clause. 0 can eliminate this extra work.

Join in a new row

The advantages of placing a join statement on a new line include:

It is easy to see all the tables involved in the query by simply scrolling down the JOIN statement list.
With JOIN, it is possible to keep all the relationship logic in one place, compared to all tables and relationship expressions in the where statement. A JOIN statement may not always follow a line, but at least in one place.
It is relatively easy to comment out joins. When debugging, it is useful when you want to know which JOIN results in data differences.

Column mode editing

Column mode editing is handy when working with a large number of fields. Here's my first animated GIF show showing you how to comment out all non-aggregated fields. In practice I use

column mode editing (external link), not just the comment field also includes:

Create a large number of indexes
With a long field list when using the UNION statement
Note the list of fields that are long by the GROUP BY clause

Test Query Results

I had to use an outer join to list all the customers, because not all of the customer's ZIP codes could be found in the location table. I was able to do this by repeatedly including and excluding different fields and tables in my query, ensuring that I was able to align with the earliest queries based on the first principle.

123456789101112131415161718 SELECT 0 ,c.cust_post_code --,p.location ,COUNT(DISTINCT c.cust_id) number_customers ,SUM(s.total_amount) AS total_salesFROM customers c --LEFT OUTER JOIN post_codes p ON c.cust_post_code = p.post_code JOIN sales s ON c.cust_id = s.cust_idWHERE 1=1 AND s.sales_date BETWEEN ‘2015-01-01‘ AND ‘2015-01-31‘ --AND c.cust_post_code = 2000 --AND p.post_code = 200GROUP BY c.cust_post_code --,p.location

For me, formatting SQL like this means I don't have to write a separate test to check the data. By commenting out some lines, I can use the first principle to test the accuracy of the data. This can improve my efficiency, as well as the accuracy of the report.

About CUCR

Format SQL to improve efficiency

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More