This article by Bole Online-cucr translation, Huang Li-min school draft. without permission, no reprint!
English Source: Msiman.ga. Welcome to join the translation team.
Background
Formatted SQL does not run faster than unformatted SQL. The database may not really care if you put commas in front of or behind each field name. To help you become more sensible and be an efficient SQL writer, I recommend that you follow some formatting guidelines. In this article, I'll share how to format SQL statements to improve productivity. I define productivity as a way to get accurate results from SQL, while the code is easy to understand, modify, and Debug. I'll just focus on the SELECT statement, which accounts for 99% of the SQL I write. Formatting SQL code is a very personal choice, and I know very well that different people will love their own formatting rules.
Examples of problems
This is a typical SQL usage scenario that requires such a report in business, with data in three tables, customer, sales, and location, respectively. In January 2015, the report needs to show the number of customers and total sales in each ZIP code area. This should be a simple SQL statement that connects three tables.
There may be a problem with the data
While SQL is easy, ensuring that results are accurate is a real challenge, and the following are one of many possible causes, including:
- The data may come from different data sources. This means that referential integrity is not guaranteed in different tables. Simply put, you cannot assume that all postal codes on the Customer table are valid, and that the problem also exists on the location table.
- Applications that access customer data may not have the appropriate data validation. An incorrect ZIP code may have been deposited.
- The postcode table may not have all zip codes. The new postal code code may be introduced, but has not been added to the table since the last update.
First principle
For me, formatted SQL is more likely to get the correct results from SQL because it has clear SQL and is easy to track. The first thing I do write a statement that gets the total number of customers. This is a number, and I'll compare it after I've written the entire statement.
The first statement I wrote was:
1234 |
SELECT COUNT ( DISTINCT cust_id) AS count_customers FROM customers |
Result:
This query is important because it follows the first principle (external link). Because there is no SQL connection, so there is no dependency, I know this is the correct number of customers. I always write down the results, because I always need to compare this number in this article is 10.
The next thing I want to do is add the necessary fields and tables to this query. I emphasize adding the word because, based on the formatting rules I follow, I can comment out the elements of the query to get the same results as when I applied the first principle. Here is my final formatted query, using the way you format the query.
Formatting sql
Here is my recommended formatted SQL, which is followed by the justification for the formatting choices I made.
1234567891011121314151617 |
SELECT
0
,c.cust_post_code
,p.location
,
COUNT
(
DISTINCT c.cust_id) number_customers
,
SUM
(s.total_amount)
AS total_sales
FROM
customers c
JOIN post_codes p
ON c.cust_post_code = p.post_code
JOIN sales s
ON c.cust_id = s.cust_id
WHERE
1=1
AND s.sales_date
BETWEEN ‘2015-01-01‘ AND ‘2015-01-31‘
--AND s.order_id = 5
GROUP BY
c.cust_post_code
,p.location
|
Always use table aliases
This will be confirmed in your SQL. If you do not use aliases (external links) for each field that participates in the query, sometimes later, fields with the same name are added to a table used in the query. Your query and your report will appear with an error (duplicate field names are found).
Comma in front of field
When I debug/test my query, it makes it easy for me to comment and uncomment the fields without having to modify any other rows in the query to make sure the comma is in the correct place. I've read some articles that bloggers have to change part of another query to punches to make sure the comma is correct, but it's a big problem if you spend most of your time writing and testing SQL statements. You will be more efficient in this way. This is workplace very well in the Select and GROUP by queries section.
I use SELECT 0 in the development environment and tend to remove it before entering the production environment. It allows me to put commas in front of all the fields. If there is no 0, I would like to comment out C.cust_post_code, which is the first field, I must comment out the comma before the second field. I also do the same thing in the GROUP by clause. 0 can eliminate this extra work.
Join in a new row
The advantages of placing a join statement on a new line include:
- It is easy to see all the tables involved in the query by simply scrolling down the JOIN statement list.
- With JOIN, it is possible to keep all the relationship logic in one place, compared to all tables and relationship expressions in the where statement. A JOIN statement may not always follow a line, but at least in one place.
- It is relatively easy to comment out joins. When debugging, it is useful when you want to know which JOIN results in data differences.
Column mode editing
Column mode editing is handy when working with a large number of fields. Here's my first animated GIF show showing you how to comment out all non-aggregated fields. In practice I use
column mode editing (external link), not just the comment field also includes:
- Create a large number of indexes
- With a long field list when using the UNION statement
- Note the list of fields that are long by the GROUP BY clause
Test Query Results
I had to use an outer join to list all the customers, because not all of the customer's ZIP codes could be found in the location table. I was able to do this by repeatedly including and excluding different fields and tables in my query, ensuring that I was able to align with the earliest queries based on the first principle.
123456789101112131415161718 |
SELECT
0
,c.cust_post_code
--,p.location
,
COUNT
(
DISTINCT c.cust_id) number_customers
,
SUM
(s.total_amount)
AS total_sales
FROM
customers c
--LEFT OUTER JOIN post_codes p ON c.cust_post_code = p.post_code
JOIN sales s
ON c.cust_id = s.cust_id
WHERE
1=1
AND s.sales_date
BETWEEN ‘2015-01-01‘ AND ‘2015-01-31‘
--AND c.cust_post_code = 2000
--AND p.post_code = 200
GROUP BY
c.cust_post_code
--,p.location
|
For me, formatting SQL like this means I don't have to write a separate test to check the data. By commenting out some lines, I can use the first principle to test the accuracy of the data. This can improve my efficiency, as well as the accuracy of the report.
About CUCR
Format SQL to improve efficiency